schwartz@ncar.UCAR.EDU (Michael Schwartz) (10/07/90)
I have made a number of papers about the Networked Resource Discovery Project available for anonymous FTP from latour.colorado.edu, in the directory pub/RD.Papers. The README file in that directory follows. - Mike Schwartz Dept. of Computer Science Univ. of Colorado - Boulder ------------------------------------------------------------------------------- This directory contains several papers from the Networked Resource Discovery Project. Files named "*.ps.Z" are compressed PostScript files containing an entire paper. In some cases a paper is stored in a subdirectory that contains the paper and some separate figures (each as compressed PostScript files). This was necessary in cases where figures couldn't be put into a single postscript file, either because the figure was created by a drawing package that didn't generate Encapsulated PostScript, or because the figures were very large. You can either retrieve individual papers that interest you (see the abstracts below), or you can retrieve all the papers at once in ALLPAPERS.tar.Z. This is a compressed tar file containing everything in this directory (except this README file). Don't forget to set "type image" in ftp before retrieving the compressed files. A brief overview of the project and papers follows. If you have any questions or comments, please direct them to Mike Schwartz, the Principal Investigator of the project: Mike Schwartz Dept. of Computer Science Univ. of Colorado Boulder, CO 80309 (303) 492-3902 schwartz@boulder.colorado.edu ------------------------------------------------------------------------------- The Networked Resource Discovery Project is investigating means by which users can discover the existence of a variety of resources in an internet environment, such as network services, retail products, current events, data, and people in various capacities. This problem falls within a larger vision of \fIdistributed collaboration\fR, or the accomplishment of tasks through sharing of resources among many interrelated individuals across administrative boundaries. We are particularly interested in resource discovery, because we see it as an enabling (and currently limiting) technology for distributed collaboration. We impose three key goals on our approach to resource discovery. First, we are interested in very large environments, spanning national or international-sized networks. Such environments place severe scalability requirements on the algorithms that can be used. Second, we want to support searches without imposing artificial constraints on the resource space organization. Traditional directory services (such as the CCITT X.500 standard) rely on hierarchical organization to achieve good scalability. We wish to avoid a hierarchical search space. As a hierarchy is required to register an increasingly wide variety of resources, trying to search for resources becomes difficult, because the organization becomes convoluted and requires users to understand how its components are arranged. Finally, we wish to minimize the need for global administrative agreement over protocols and information formats. While standards help this process, it is quite difficult to specify standards that are both globally adopted and technologically current. We are exploring a number of approaches to distributed collaboration and resource discovery. One technique involves using probabilistic algorithms to build and search a resource graph that supports attribute-based ("yellow pages") specifications, for which it is desirable to find a small number of instances of a large class of objects. The resource graph evolves over time in accordance with what resources exist, and the types of searches that users make. Simulation results indicate that this approach can support non-hierarchical searches for an environment roughly the size of a country, with several thousand administrative domains participating in resource registration and searches. A second technique involves building an understanding of the semantics of particular resource discovery applications into the algorithms that support searches. Using this technique we have built and experimented extensively with an Internet "white pages" directory tool (called "netfind") capable of locating over 1,100,000 people in 1,900 sites around the world. We have distributed netfind on a limited basis to approximately 50 researchers around the world, who are using it actively. Distribution is currently on hold, pending further development. Another study used graph-theory and traffic analysis techniques to analyze electronic mail communication patterns among approximately 50,000 persons in 3,700 different sites around the world. In addition to the basic graph measurements, this study produced an algorithm that has potential applications for distributed collaboration, as well as privacy implications for electronic mail. Another subproject involves supporting resource discovery among the vast array of resources available at public archives at tens of thousands of sites around the Internet. We have built a prototype implementation based on an exploratory resource discovery paradigm, in which users contribute to a distributed global resource space "map" as they discover new resources, using a range of different information sources of varying degrees of quality. We are currently developing this prototype further, so that we can distribute it to other sites around the Internet, and attempt to build a map of Internet public archives. A longer range goal is to define a new Internet protocol for supporting large scale distributed collaboration. Finally, we are beginning work on a subproject to use resource discovery techniques to support a visual interface to network management for the global Internet, to allow users to observe network characteristics such as topology, geographical layout, protocol usage, loading, and congestion. A key technique involves using a number of information sources and protocols, to support discovery in the absence of global agreement on any one protocol or information source. This approach stands in contrast to relying on a single standard, such as SNMP. We believe this approach is important in large scale, administratively decentralized environments, in which it is difficult to reach global agreement or full deployment of a single standard. A list of project papers follows: %A M. F. Schwartz %T Autonomy vs. Interdependence in the Networked Resource Discovery Project %O Position paper, ACM SIGOPS European Workshop, Cambridge, England %D September 1988 %X Available for anonymous FTP from latour.colorado.edu in the file pub/RD.Papers/Auton.vs.Interdep.Wkshop.ps.Z %A M. F. Schwartz %T The Networked Resource Discovery Project %J Proceedings of the IFIP XI World Congress %C San Francisco, California %D August 1989 %P 827-832 %K Track on Communications and distributed systems %X Available for anonymous FTP from latour.colorado.edu in the directory pub/RD.Papers/Early.Pjct.Descr %X Abstract: "Large scale computer networks provide access to a bewilderingly large number and variety of resources, including retail products, network services, and people in various capacities. We consider the problem of allowing users to \fIdiscover the existence\fR of such resources in an administratively decentralized environment, using a system architecture that accesses the distributed collection of repositories that naturally maintain resource information. A key problem is organizing the resource space flexibly. Rather than imposing a hierarchical organization, our approach allows the resource space organization to evolve in accordance with usage patterns. Concretely, a set of \fIagents\fR organize and search the resource space by constructing links between the repositories of resource information based on keywords that describe the contents of each repository, and the semantics of the resources being sought. The links form a general graph, with a flexible set of hierarchies embedded within the graph to provide some measure of scalability. The graph structure evolves over time through the use of cache aging protocols. Additional scalability is targeted through the use of probabilistic graph protocols. A simulation, prototype implementation, and measurement study are under way." %A M. F. Schwartz %A P. G. Tsirigotis %T Experience with a Semantically Cognizant Internet White Pages Directory Tool %J \fRTo appear\fP, Journal of Internetworking Research and Experience %D 1990 %K Netfind %X Available for anonymous FTP from latour.colorado.edu in the file pub/RD.Papers/White.Pages.ps.Z %X Abstract: "As wide area networking technology and interconnection improve, an increasingly important problem is allowing users to navigate through the vast array of network accessible resources. In this paper we discuss experience with one technique we have developed in this regard, applied to a specific resource class. We have built a prototype tool that provides a simple Internet "white pages" directory facility. Given the name of a user and a rough description of where the user works (e.g., the company name or city), the tool attempts to locate telephone and electronic mailbox information about that user. We estimate that the scope of the directory is upwards of 1,147,000 users in 1,929 administrative domains, yet the tool does not require the type of global cooperation that many existing or proposed directory services require, namely, running special directory servers at many sites around the Internet. We accomplish this by building an understanding of the semantics of this particular resource discovery application into the algorithms that support searches, allowing the tool to make aggressive use of existing sources of relatively unstructured information. Being able to make use of such information is important in heterogeneous, administratively decentralized environments, where global agreement about highly structured information formats is difficult to achieve. At present, the tool utilizes information from USENET news messages, the Domain Naming System, the Simple Mail Transfer Protocol, and the "finger" protocol, as well as a variety of information about the meaning of and relationships between these information sources. Other sources of resource information (such as the CCITT X.500 directory service) can easily be incorporated into the tool as they become available. The tool achieves good response time through the use of parallel queries." %A M. F. Schwartz %T A Scalable, Non-Hierarchical Resource Discovery Mechanism Based on Probabilistic Protocols %R Technical Report CU-CS-474-90 %I Department of Computer Science, University of Colorado, Boulder, Colorado %D June 1990 %O Submitted for publication %K Yellow pages, YP %X Available for anonymous FTP from latour.colorado.edu in the directory pub/RD.Papers/ProbYP %X Abstract: "Computer network interconnection provides access to a bewildering array of resources, including databases, network services, and people in various capacities. We consider the problem of allowing users to discover the existence of such resources in a large scale, administratively decentralized environment. While hierarchically organized resource registries have good scalability properties, they provide poor support for resource discovery, because users must understand how the nested components are arranged. In this paper we present a probabilistic approach that supports non-hierarchical, attribute based "yellow pages" searches. The protocols support locating a small number of instances of moderately large classes of objects. The resource graph evolves over time in accordance with what resources exist and the types of searches that users make. Simulation results indicate that the approach can support scalable and flexible resource discovery for an environment roughly the size of a large country, with several thousand administrative domains participating in resource registration and searches. Moreover, the probabilistic search strategy naturally supports fair access among competing information providers." %A M. F. Schwartz %A D. C. M. Wood %T A Measurement Study of Organizational Properties in the Global Electronic Mail Community %R Technical Report CU-CS-482-90 %I Department of Computer Science, University of Colorado, Boulder, Colorado %D August 1990 %O Submitted for publication %X Available for anonymous FTP from latour.colorado.edu in the directory pub/RD.Papers/Email.Study %X Abstract: "Computer systems intended for use in large scale environments are typically organized according to rigid hierarchical structures. For example, traditional file and directory services rely on hierarchical organization to enhance scalability. Motivated by hierarchy's poor support for navigating among large, highly diverse collections of resources (the \fIresource discovery\fR problem), we have become interested in organizational structures that arise naturally when people collaborate. In this paper we explore the graph structure resulting from global electronic mail communication. We characterize the structure through analysis of data collected about international electronic mail communication patterns among approximately 50,000 people in 3,700 different administrative domains. We define an \fIInterest Specialization Graph\fR structure that provides the scalability of a hierarchy without its organizational inflexibility. We believe that systems organized with this graph structure offer promise of better supporting the organizational needs of a large environment characterized by widespread interorganizational collaboration." %A M. F. Schwartz %A D. R. Hardy %A W. K. Heinzman %A G. Hirschowitz %T Supporting Resource Discovery Among Public Internet Archives Using a Spectrum of Information Quality %R Technical Report CU-CS-487-90 %D September 1990 %O Submitted for publication %X Available for anonymous FTP from latour.colorado.edu in the file pub/RD.Papers/RD.For.Anon.FTP.ps.Z %X Abstract: "Wide area networks offer access to an increasing number and variety of resources, such as documents, software, data, network services, and people. Yet, it is difficult to locate resources of interest, because of the scale and decentralized nature of the environment. We are interested in supporting a global confederation of loosely cooperating systems and users that share far more resources than can be completely organized. Therefore, mechanisms are needed to support incremental organization of the resources, based on the efforts of many geographically decentralized individuals, and a range of different information sources of varying degrees of quality. In this paper we describe a prototype implementation of a set of mechanisms intended to explore this problem in the specific domain of public Internet archives, accessible via the "anonymous" File Transfer Protocol. This is an interesting test case, because it encompasses a very large scale, administratively decentralized collection of resources, with considerable practical value. The resource discovery paradigm is exploratory in nature, with users contributing to the global resource space organization as they discover new resources. At present, three levels of information quality are supported. At the highest level, resources are described using an archive-site-resident database, with individual resources described according to their conceptual roles. Below that, per-user and per-user-site caches are maintained, to record resources that have been found by individual users during their explorations. At the lowest level, the system monitors announcements of public archive availability from USENET electronic bulletin board articles, to provide a simple keyword-based index of resources throughout the global network."