Wayne@OZ.AI.MIT.EDU (Wayne McGuire) (11/18/87)
[Drexler presented a stimulating talk at the MIT Media Lab yesterday on hypertext, which quickly developed into a discussion about how to filter out all the junk which would be attached by billions of people to trillions of documents, and to zoom in on precisely that small set information which is most valuable for one's purposes. Drexler is seeking to make an important distinction between micro-hypertext--for instance, programs like Hypercard whose domain is the information space of an individual user--and macro-hypertext, whose domain is the information space of the entire world. I agree with him that the latter technology is far more interesting than the former. (Let me emphasize that the terms "micro-hypertext" and "macro-hypertext" are my own invention, and may not be properly descriptive. Following is a message about his talk to another list, the context of which should be obvious. -- WHM] Drexler's hypertext talk seemed to serve mainly as an occasion to discuss the problem of filtering the deluge of often trivial information which the creation of a global hypermedia network will inevitably exacerbate by many orders of magnitude. A few thoughts: The filtering problem might best by solved by a regular communication between two agents: (1) one's intelligent personal assistant which runs continuously and automatically in the background on one's local machine, monitoring, analyzing, and weighting one's attentional patterns, interests, and cognitive styles and capacities, and (2) a global superintelligence which is stocked full of algorithms which combine the best insights and rules of thumb from the best minds in all fields for measuring the worth of and prioritizing new information in general and within specific domains. The personal assistant would build an ever-evolving and changing model of its master's mind, and periodically communicate it to the global intelligence; the global intelligence, in turn, would recommend from the set of all information in the world the best set of information (in its wise estimation) which would satisfy the needs, and and maximize the personal development, of the user, taking into account the present state of his or her knowledge, resources, etc. This scheme might really not be as far-fetched as it sounds. Even now it would be feasible to write a program running as a background meta-task which would monitor a user's computing activity and determine, say, that Walter or Mary is more interested in Qlisp than hypertext, and even more interested in multiprocessing micros than Qlisp. A simple analysis of word frequencies would suffice, but one can imagine even more sophisticated algorithms to fine tune the cognitive and attentional profile. Walter or Mary would probably not know all the language forms in which information about multiprocessing micros is expressed, or all the sources of information in the world on the topic ranked by value, but the global intelligence most certainly would. The basic elements of a global intelligence for information evaluation are already in place. Consider, for instance, the mammoth citation indexes produced by the Institute for Scientific Information in Philadelphia. ISI has developed formulas for measuring the citation frequencies and citation impacts of authors, works, serials and organizations. The presumption is that an object with a high citation impact might be more worthwhile to pay attention to than one with a low citation impact. Another approach in citation analysis has been to uncover automatically networks and families of authors, works, serials and organizations through co-citation analysis. Objects which are often co-cited with other objects are usually closely connected conceptually. Citation analysis is only one of many methods that could be integrated into a global information evaluator that would make gentle recommendations to a personal assistant offering it a user profile. Of course, the privacy issue will be raised by many. The simple solution is to turn off your personal assistant, or leave it on but don't let it talk to the global brain. The bottom line in all of this is that by maximizing the personal development of each individual, the full potential of society as a whole can be fulfilled to the benefit of everyone. Marvin at Drexler's talk referred to an information retrieval system (Indexor?), developed by David Waltz's group at Thinking Machines, which locates other documents out in information space which closely resemble by certain complex criteria a given document at hand. Does this program only run on Connection Machines? Will a smaller version of it be developed for personal computers? Wayne
mt@MEDIA-LAB.MEDIA.MIT.EDU (Michael Travers) (11/19/87)
Intelligent filtering is an idea that's been around for a while. I'm sure it will be useful and probably necessary, but as a concept it seems to bypass the fundamental idea of hypertext. Filtering implies a single undifferentiated stream of messages, with a person or a process picking out the interesting ones using some set of criteria. This is how e-mail works now--all messages are collected into a serial ordering in your mailbox, and you provide the filtering (some mail readers provide some help in this). But there is no good reason to collapse a hypertext into a stream in the first place! A hypertext is a network, and if it is densly connected, you should rarely have a need to do global searches on it. Instead, you use your favorite index to find an entry point to the area your are interested in, and chase references from there. Browsing, or spreading activation, is a better metaphor than filtering here. Local filtering on references might be useful (ie, if you are doing some form of cognitive science and believe that neuroscience has little to offer, you might have a rule that says "be less interested in any articles if the journal name contains the string 'neuro'.") But global filtering will be mostly unnecessary if the hypertext is any good. It's a method for dealing with non-hypertext-ness of current media, and not the thing to be have foremost in mind when thinking about hypertext systems.
Wayne@OZ.AI.MIT.EDU (Wayne McGuire) (11/19/87)
Even after a robust global hypermedia network is brought into being --one which richly interlinks the full semantic and propositional content of all the texts and digitized audio/visual works in the world, including all the informal and spontaneous nth levels of commentary on primary works by anyone and everyone--one will still require an intelligent filter to make this information manageable, to access it and use it most productively and not get buried by trivia. Perhaps powerful filters, albeit local not global, will be required _especially then_, even more than now. Even if one is stationed on a fairly specialized node of hypermedia knowledge space--let's say parallel processing programming languages--each day is likely to bring into one's mailbox or dynabook far more items, and pointers to items, and pointers to pointers to items, etc. than anyone would be able usefully to sort through and prioritize with the aim of reading carefully even a small percentage of the take. In this situation a global superintelligence and information evaluator would be helpful in deciding which handful of hundreds or thousands of links and pointers attached to a given item is, from the cognitive context of a particular user, worth tracking down in depth. Hypermedia will not alter the fundamental human constraint that we read words and documents serially, one after another, and that while the volume of new information is exploding, the time in which to select and meaningfully absorb knowledge from the world remains constant. Another thought on how a global intelligence might use user models and profiles assembled by personal assistants: any GM (Global Mind) worth its salt, and able to learn from its experience, would be able to say in the case of Person X, I've seen nearly 5,000 cases like this one before; by abstracting all the knowledge from those previous cases, there is a high probability that X needs or wants Y, but doesn't yet know it and wouldn't know how to get Y if he or she even knew that Y was required and available. I know exactly the best way to open X's mind to the knowledge that Y is probably what they need to pay attention to now to get on with the next stage in their development. Of course, the privacy issue still looms large in all this, and the potential for abuse (as in the thought recognition research you earlier alerted us to) is enormous. One should always have the option simply to say no to cooperating with a personal assistant which in turn is cooperating with a global supermind. In a worst-case scenario one's micro could become one's figurative jailer, the oppressive agent of a police state, and not really your good buddy in the quest for self-realization. Yet another thought: each day now in many settings, from global networks to small BBS's, thousands of email messages are being exchanged, many of which are gropiong in deep ignorance on sundry topics. The person who left a message on a Virginia BBS yesterday (this actually happened) which reveals a misunderstanding of cryptographic techniques doesn't know that someone somewhere else in the world, on a network that is invisible to him, left a message on the same day and on the same topic that dispells this ignorance. One can see possibilities in these situations for an automateed Global Referee (to be turned on or off at will, of course), specializing in spreading the light and tearing down walls of ignorance. How about a standard hypermedia link/property: is-more-authoritative-than. Wayne
madd@BUCSF.BU.EDU.UUCP (11/19/87)
[About intelligeng filtering] |But there is no good reason to collapse a hypertext into a stream in |the first place! A hypertext is a network, and if it is densly |connected, you should rarely have a need to do global searches on it. |Instead, you use your favorite index to find an entry point to the |area your are interested in, and chase references from there. [...] |Local filtering on references might be useful [...] |But global filtering will be mostly unnecessary if the hypertext is |any good. It's a method for dealing with non-hypertext-ness of |current media, and not the thing to be have foremost in mind when |thinking about hypertext systems. I think this is true. Consider an example: (relatively) recently the Grolier Encyclopaedia was placed on CD-ROM. A large amount of processing time went into generating a fantastic cross-reference for the encyclopaedia. While this isn't hypertext by definition, you can see how the idea applies; the computer was used to generate the cross-links that would have been user-generated in a hypertext environment. Anyway, the cross-reference ended up being about the size of the encyclopaedia but made it possible to find even obscure references in only seconds WITHOUT A GLOBAL SEARCH. This encyclopaedia dealt with only a few megabytes (200? something like that) though; it would be interesting to see what would happen if you're dealing with several magnitudes of that amount. I suppose the only problem with hypertext is that the user might not generate the extensive links that you might like, either through laziness or ignorance. In any case this problem will become more serious as your database grows larger, so you'll probably need either an automatic link-generator or some sort of global search mechanism to help find items that were not properly linked. I'd opt for the automatic link generator since it'd be easier to search a database of link topics than the entire database! jim frost madd@bucsb.bu.edu
dm@BFLY-VAX.BBN.COM.UUCP (11/19/87)
Perhaps I'm stating the obvious, but these referees sound very much like two things: 1) a good reference librarian. 2) a good editor of an eclectic journal (e.g., Harper's or the Atlantic). Indeed, when one is just ``interested in things'' one goes to a magazine like Harper's and browses (Harper's and the Utne Reader are particularly well suited for this). I would expect that hypermedia will have the equivalent: people who prospect the fields of hypermedia and leave behind a trail that others can follow to the gold. Ted Nelson's hypertext project, Xanadu, devotes a great deal of attention to royalties in order to encourage this practice. Until computer programs are interesting companions in their own right, I'll bet people will be able to do this better. When one is researching a particular project, the services of a good research librarian are invaluable. Research librarians come in two flavors: generalists (the kind you'll find at the public library) and specialists (the kind you'll find in a university department's reading room or library). These people are experts at gleaning information from the library. They spend many years learning their trade and learning their library. In designing a hypertext filter, it is the expertise of these people that you'll want to tap. I expect that in the hypertext morass, there will still be people you go to whose expertise and advise will guide you through the twisty maze of hypertext links to the valuable information. Those people will develop tools to bring to bear to help you find your way to your goal (there's a REASON it's called library SCIENCE (though it should probably be called ``library engineering'')). Now THERE would be an expert system to develop...
FRUIN@HLERUL5.BITNET (Thomas Fruin) (11/20/87)
> From: Wayne McGuire <Wayne%OZ.AI.MIT.EDU@XX.LCS.MIT.EDU> > Subject: Filtering A Global Hypermedia Network What is the rationale for bringing a "global superintelligence" in to solve the filtering problem for a global hypermedia network? There are _so_ many disadvantages of having one centralized body: impracticality due to size, reliability (what if the thing goes down), and the issue of privacy you already mentioned. Of course "turning off your personal assistant" if you are worried about privacy is no solution at all. That's like solving the many car accidents by refraining from driving. -- Thomas Fruin fruin@hlerul5.BITNET thomas@uvabick.UUCP 2:500/15 on FidoNet Leiden University, Netherlands
Wayne@OZ.AI.MIT.EDU.UUCP (11/20/87)
> Date: Fri, 20 Nov 87 01:36 N > From: <FRUIN%HLERUL5.BITNET@BUACCA.BU.EDU> (Thomas Fruin) > > What is the rationale for bringing a "global superintelligence" in to > solve the filtering problem for a global hypermedia network? There are > _so_ many disadvantages of having one centralized body: impracticality > due to size, reliability (what if the thing goes down), and the issue > of privacy you already mentioned. Impracticality due to size: with nanotechnology and Crays that will fit in pocket watches or teeth? Reliability: why can't a global mind or global hypermedia advisor replicate itself each day and be distributed by fiber optic or superconductive links in multiple copies throughout all the cities in the world? If one goes down, just turn on another. Privacy: yes, a serious problem, but you should realize that we already leave behind us a large and detailed digital trail which profiles our most intimate habits of mind. Many large corporations and government agencies can access and manipulate that data now. Your privacy is already long gone. So why would one want a global hypermedia advisor? For the same reasons, I suppose, that most of us would rather take advantage of the resources of the Library of Congress or Harvard's Widener Library than those of our local public library: knowledge and power. It's a basic human drive. Wayne
Wayne@OZ.AI.MIT.EDU.UUCP (11/20/87)
> Date: Thu, 19 Nov 87 09:14:57 est > From: madd@bucsf.bu.edu (Jim Frost) > > Anyway, the cross-reference ended up being about the size of the > encyclopaedia but made it possible to find even obscure references in > only seconds WITHOUT A GLOBAL SEARCH.... The power of the indexing system for the Grolier CD-ROM lies precisely in the fact that IS based on a global analysis of the total text. An intelligent agent (no doubt a few people armed with computers and the appropriate software) scanned the entire text for conceptual links. Any users of the Grolier CD-ROM are taking advantage of an indexing scheme built on this global preprocessing. You don't have to conduct a global search, because someone has already done it for you, although no doubt your or my personal global analysis would turn up radically different links than did Grolier's editors. ISI's citation indexes, which cover the majority of the world scientific literature, are also based on a global scan, in this case of millions of documents. It is impossible to predict what journal in what domain will refer to a given document, and so it is necessary to analyze (nearly) all the scientific journals in the world to uncover citation links. Never underestimate the necessity for or power of global analysis. Any local structure is only as robust as its knowledge of the entire world. Presumably a global hypermedia advisor would be very robust indeed. Wayne
FRUIN@HLERUL5.BITNET.UUCP (11/20/87)
> Date: 20 Nov 1987 05:18 EST (Fri) > From: Wayne McGuire <Wayne%OZ.AI.MIT.EDU@XX.LCS.MIT.EDU> > Subject: Filtering A Global Hypermedia Network If communications speeds are going to be so much higher, what's the point in cramming everything into one big hypermedia adviser? I thought the way of the future was networking. A more likely prospect is that each person's advisor queries several databases around the world and copies whatever relevant information it finds there. Big centralized systems will always stay slow, impractical, and unreliable because with the advancement of technology the amount of digitized information is growing at an ever faster rate. > Privacy: yes, a serious problem, but you should realize that we > already leave behind us a large and detailed digital trail which > profiles our most intimate habits of mind. Many large corporations > and government agencies can access and manipulate that data now. Your > privacy is already long gone. You're very cynical here, and maybe you are right. I want to think there is still hope, though, and in that case a centralized hypermedia advisor is not the way to go. There is a big difference in leaving behind a _public_ digital trail (like messages in newsgroups) and a trail that "profiles our most intimate habits of mind". What do you mean by that? In Holland a new law will soon take effect regarding databases that store information about people. It's basic premise is that a database should have a GOAL, i.e. to send you your electricity bill or to keep track of your car's registration number. It is FORBIDDEN two match or combine any two databases that don't have the same goal. You can take anybody to court who does so anyway. This should make it very hard for corporations and government agencies to access any information about you. -- Thomas Fruin fruin@hlerul5.BITNET thomas@uvabick.UUCP 2:500/15 on FidoNet Leiden University, Netherlands
bzs@BU-CS.BU.EDU (Barry Shein) (11/20/87)
From: <FRUIN%HLERUL5.BITNET@BUACCA.BU.EDU> (Thomas Fruin) >If communications speeds are going to be so much higher, what's the point in >cramming everything into one big hypermedia adviser? I thought the way of the >future was networking. I've had some conversations with folks here who are working on large hypertext projects and some of them in fact do not believe the future is in networks at all. One major reason they cite is inevitable frustration of dealing with the necessary central organization who would be running the network (and, of course, varying scepticism on the available bandwidth.) The system of the future they envision would be something more like a desktop, high-speed multi-processor with CD-ROM readers and a nice stack of CD-ROMs (not unlike your current CD player.) People would buy sets of CDs to start collections (not unlike investing in a good encyclopaedia) and beyond that would either buy them in typical ways or subscribe to "CD of the month" clubs which might send you all of the previous months journals w/in some field (or popular mags, whatever.) To be more up to date you might use a network to peruse very current stuff, it's not either or, but the network may not be a critical component. Another very important point that was stated was: How do you make money on networks? Connect charges? Access charges, etc? Nuisance service organizations and open-ended costs, blech. Notice all the hostility towards the phone company? People will leap at alternatives like private collections. You get what you want, when you want and you (the service org) doesn't have to figure out how to get everything on-line at all times (that is, analogous to the reason that VCRs sell better than attempts at Pay-per-view cable services.) There's far more money (they claim) to be made in selling everyone their own copies of the stuff and that's where the "smart" money is going. Remember, this is not so much an issue of what is possible (eg. discussing suitably high-speed network technology) but where the MONEY is going to go for R&D. And there is some indication that it prefers the idea of publishing and sales to building service organizations. There's a very heavy socio-economic aspect here that cannot be overlooked. -Barry Shein, Boston University
Wayne@OZ.AI.MIT.EDU (Wayne McGuire) (11/22/87)
A global hypermedia advisor doesn't need to be a big centralized system in the sense of storing the full text of all the documents in the world, but it should be a supreme index of indexes--a clearinghouse of pointers to pointers to pointers ad infinitum to all the information chunks and information chunk types (including the full text of documents and document elements) on all the networks in the world. An analogy might be the Harvard Union Catalog, which stores easily accessible pointers to all the works in the many libraries in the Harvard library network. But a GHA would be much more powerful than, say, the HUC, since it would embody the best knowledge of the best experts in the world about the conceptual structures of their domains. I am not being cynical about privacy, merely realistic. Regarding "intimate habits of mind": certainly one's banking and telephone records, which chronicle in exquisite detail what one buys and with whom one communicates, provide an in-depth psychological profile to the eye of an acute analyst. Holland and other nations may be passing laws to restrict access to these records in the usual case, but the security and intelligence establishments of most of these countries can find loopholes and exceptions in these laws through which to drive fleets of Mack trucks. As a general rule, whatever flows through a telecommunications channel should not be considered private. James Bamford in _The Puzzle Palace_ outlines the methods of the NSA for intercepting and analyzing global telecommunications. England's Government Communications Headquarters and the Soviet Union's KGB (or the Soviet equivalent to the NSA) are engaged in the same activities. They don't capture everything, but they get enough. They probably have as much regard for the spirit and letter of the public privacy laws as do drivers on the Massachusetts Turnpike for the 55 mph speed limit. As far as protecting your privacy from the general public, I assume that with a global hypermedia advisor one could choose how much of one's profile to make public, or one could choose not to interact with the system at all. Current online database vendors like Dialog and Mead Data Central are already foreshadowings (albeit extremely primitive) of a GHA. It is interesting to recall that under the reign of John Poindexter, of Irangate fame, the NSC was seeking to gain legal access to the records of these companies, which store sensitive information about the search targets and patterns of their users. As I recall, the NSC was denied legal access by Congress, but then there is always the problem of illegal access, which is relatively trivial to accomplish wholesale by intercepting telecommunications. Wayne