[comp.os.research] Resource Discovery papers available by anonymous FTP

schwartz@ncar.UCAR.EDU (Michael Schwartz) (10/07/90)

I have made a number of papers about the Networked Resource Discovery
Project available for anonymous FTP from latour.colorado.edu, in the
directory pub/RD.Papers.  The README file in that directory follows.
 - Mike Schwartz
   Dept. of Computer Science
   Univ. of Colorado - Boulder
-------------------------------------------------------------------------------
This directory contains several papers from the Networked Resource
Discovery Project.  Files named "*.ps.Z" are compressed PostScript
files containing an entire paper.  In some cases a paper is stored in a
subdirectory that contains the paper and some separate figures (each as
compressed PostScript files).  This was necessary in cases where
figures couldn't be put into a single postscript file, either because
the figure was created by a drawing package that didn't generate
Encapsulated PostScript, or because the figures were very large.

You can either retrieve individual papers that interest you (see the
abstracts below), or you can retrieve all the papers at once in
ALLPAPERS.tar.Z.  This is a compressed tar file containing everything
in this directory (except this README file).

Don't forget to set "type image" in ftp before retrieving the
compressed files.

A brief overview of the project and papers follows.  If you have any
questions or comments, please direct them to Mike Schwartz, the
Principal Investigator of the project:

	Mike Schwartz
	Dept. of Computer Science
	Univ. of Colorado
	Boulder, CO 80309
	(303) 492-3902
	schwartz@boulder.colorado.edu

-------------------------------------------------------------------------------

The Networked Resource Discovery Project is investigating means by
which users can discover the existence of a variety of resources in an
internet environment, such as network services, retail products,
current events, data, and people in various capacities.  This problem
falls within a larger vision of \fIdistributed collaboration\fR, or the
accomplishment of tasks through sharing of resources among many
interrelated individuals across administrative boundaries.  We are
particularly interested in resource discovery, because we see it as an
enabling (and currently limiting) technology for distributed
collaboration.

We impose three key goals on our approach to resource discovery.
First, we are interested in very large environments, spanning national
or international-sized networks.  Such environments place severe
scalability requirements on the algorithms that can be used.  Second,
we want to support searches without imposing artificial constraints on
the resource space organization.  Traditional directory services (such
as the CCITT X.500 standard) rely on hierarchical organization to
achieve good scalability.  We wish to avoid a hierarchical search
space.  As a hierarchy is required to register an increasingly wide
variety of resources, trying to search for resources becomes difficult,
because the organization becomes convoluted and requires users to
understand how its components are arranged.  Finally, we wish to
minimize the need for global administrative agreement over protocols
and information formats.  While standards help this process, it is
quite difficult to specify standards that are both globally adopted and
technologically current.

We are exploring a number of approaches to distributed collaboration
and resource discovery.  One technique involves using probabilistic
algorithms to build and search a resource graph that supports
attribute-based ("yellow pages") specifications, for which it is
desirable to find a small number of instances of a large class of
objects.  The resource graph evolves over time in accordance with what
resources exist, and the types of searches that users make.  Simulation
results indicate that this approach can support non-hierarchical
searches for an environment roughly the size of a country, with several
thousand administrative domains participating in resource registration
and searches.

A second technique involves building an understanding of the semantics
of particular resource discovery applications into the algorithms that
support searches.  Using this technique we have built and experimented
extensively with an Internet "white pages" directory tool (called
"netfind") capable of locating over 1,100,000 people in 1,900 sites
around the world.  We have distributed netfind on a limited basis to
approximately 50 researchers around the world, who are using it
actively.  Distribution is currently on hold, pending further
development.

Another study used graph-theory and traffic analysis techniques to
analyze electronic mail communication patterns among approximately
50,000 persons in 3,700 different sites around the world.  In addition
to the basic graph measurements, this study produced an algorithm that
has potential applications for distributed collaboration, as well as
privacy implications for electronic mail.

Another subproject involves supporting resource discovery among the
vast array of resources available at public archives at tens of
thousands of sites around the Internet.  We have built a prototype
implementation based on an exploratory resource discovery paradigm, in
which users contribute to a distributed global resource space "map" as
they discover new resources, using a range of different information
sources of varying degrees of quality.  We are currently developing
this prototype further, so that we can distribute it to other sites
around the Internet, and attempt to build a map of Internet public
archives.  A longer range goal is to define a new Internet protocol for
supporting large scale distributed collaboration.

Finally, we are beginning work on a subproject to use resource
discovery techniques to support a visual interface to network
management for the global Internet, to allow users to observe network
characteristics such as topology, geographical layout, protocol usage,
loading, and congestion.  A key technique involves using a number of
information sources and protocols, to support discovery in the absence
of global agreement on any one protocol or information source.  This
approach stands in contrast to relying on a single standard, such as
SNMP.  We believe this approach is important in large scale,
administratively decentralized environments, in which it is difficult
to reach global agreement or full deployment of a single standard.

A list of project papers follows:

%A M. F. Schwartz
%T Autonomy vs. Interdependence in the Networked Resource Discovery Project
%O Position paper, ACM SIGOPS European Workshop, Cambridge, England
%D September 1988
%X Available for anonymous FTP from latour.colorado.edu in the file
pub/RD.Papers/Auton.vs.Interdep.Wkshop.ps.Z

%A M. F. Schwartz
%T The Networked Resource Discovery Project
%J Proceedings of the IFIP XI World Congress
%C San Francisco, California
%D August 1989
%P 827-832
%K Track on Communications and distributed systems
%X Available for anonymous FTP from latour.colorado.edu in the directory
pub/RD.Papers/Early.Pjct.Descr
%X Abstract:  "Large scale computer networks provide access to a
bewilderingly large number and variety of resources, including retail
products, network services, and people in various capacities.  We
consider the problem of allowing users to \fIdiscover the existence\fR
of such resources in an administratively decentralized environment,
using a system architecture that accesses the distributed collection of
repositories that naturally maintain resource information.  A key
problem is organizing the resource space flexibly.  Rather than
imposing a hierarchical organization, our approach allows the resource
space organization to evolve in accordance with usage patterns.
Concretely, a set of \fIagents\fR organize and search the resource
space by constructing links between the repositories of resource
information based on keywords that describe the contents of each
repository, and the semantics of the resources being sought.  The links
form a general graph, with a flexible set of hierarchies embedded
within the graph to provide some measure of scalability.  The graph
structure evolves over time through the use of cache aging protocols.
Additional scalability is targeted through the use of probabilistic
graph protocols.  A simulation, prototype implementation, and
measurement study are under way."

%A M. F. Schwartz
%A P. G. Tsirigotis
%T Experience with a Semantically Cognizant Internet White Pages Directory
Tool
%J \fRTo appear\fP, Journal of Internetworking Research and Experience
%D 1990
%K Netfind
%X Available for anonymous FTP from latour.colorado.edu in the file
pub/RD.Papers/White.Pages.ps.Z
%X Abstract:  "As wide area networking technology and interconnection
improve, an increasingly important problem is allowing users to
navigate through the vast array of network accessible resources.  In
this paper we discuss experience with one technique we have developed
in this regard, applied to a specific resource class.  We have built a
prototype tool that provides a simple Internet "white pages" directory
facility.  Given the name of a user and a rough description of where
the user works (e.g., the company name or city), the tool attempts to
locate telephone and electronic mailbox information about that user.
We estimate that the scope of the directory is upwards of 1,147,000
users in 1,929 administrative domains, yet the tool does not require
the type of global cooperation that many existing or proposed directory
services require, namely, running special directory servers at many
sites around the Internet.  We accomplish this by building an
understanding of the semantics of this particular resource discovery
application into the algorithms that support searches, allowing the
tool to make aggressive use of existing sources of relatively
unstructured information.  Being able to make use of such information
is important in heterogeneous, administratively decentralized
environments, where global agreement about highly structured
information formats is difficult to achieve.  At present, the tool
utilizes information from USENET news messages, the Domain Naming
System, the Simple Mail Transfer Protocol, and the "finger" protocol,
as well as a variety of information about the meaning of and
relationships between these information sources.  Other sources of
resource information (such as the CCITT X.500 directory service) can
easily be incorporated into the tool as they become available.  The
tool achieves good response time through the use of parallel queries."

%A M. F. Schwartz
%T A Scalable, Non-Hierarchical Resource Discovery Mechanism Based on
Probabilistic Protocols
%R Technical Report CU-CS-474-90
%I Department of Computer Science, University of Colorado, Boulder,
Colorado
%D June 1990
%O Submitted for publication
%K Yellow pages, YP
%X Available for anonymous FTP from latour.colorado.edu in the directory
pub/RD.Papers/ProbYP
%X Abstract:  "Computer network interconnection provides access to a
bewildering array of resources, including databases, network services,
and people in various capacities.  We consider the problem of allowing
users to discover the existence of such resources in a large scale,
administratively decentralized environment.  While hierarchically
organized resource registries have good scalability properties, they
provide poor support for resource discovery, because users must
understand how the nested components are arranged.  In this paper we
present a probabilistic approach that supports non-hierarchical,
attribute based "yellow pages" searches.  The protocols support
locating a small number of instances of moderately large classes of
objects.  The resource graph evolves over time in accordance with what
resources exist and the types of searches that users make.  Simulation
results indicate that the approach can support scalable and flexible
resource discovery for an environment roughly the size of a large
country, with several thousand administrative domains participating in
resource registration and searches.  Moreover, the probabilistic search
strategy naturally supports fair access among competing information
providers."

%A M. F. Schwartz
%A D. C. M. Wood
%T A Measurement Study of Organizational Properties in the Global
Electronic Mail Community
%R Technical Report CU-CS-482-90
%I Department of Computer Science, University of Colorado, Boulder,
Colorado
%D August 1990
%O Submitted for publication
%X Available for anonymous FTP from latour.colorado.edu in the directory
pub/RD.Papers/Email.Study
%X Abstract:  "Computer systems intended for use in large scale
environments are typically organized according to rigid hierarchical
structures.  For example, traditional file and directory services rely
on hierarchical organization to enhance scalability.  Motivated by
hierarchy's poor support for navigating among large, highly diverse
collections of resources (the \fIresource discovery\fR problem), we
have become interested in organizational structures that arise
naturally when people collaborate.  In this paper we explore the graph
structure resulting from global electronic mail communication.  We
characterize the structure through analysis of data collected about
international electronic mail communication patterns among
approximately 50,000 people in 3,700 different administrative domains.
We define an \fIInterest Specialization Graph\fR structure that
provides the scalability of a hierarchy without its organizational
inflexibility.  We believe that systems organized with this graph
structure offer promise of better supporting the organizational needs
of a large environment characterized by widespread interorganizational
collaboration."

%A M. F. Schwartz
%A D. R. Hardy
%A W. K. Heinzman
%A G. Hirschowitz
%T Supporting Resource Discovery Among Public Internet Archives Using a
Spectrum of Information Quality
%R Technical Report CU-CS-487-90
%D September 1990
%O Submitted for publication
%X Available for anonymous FTP from latour.colorado.edu in the file
pub/RD.Papers/RD.For.Anon.FTP.ps.Z
%X Abstract:  "Wide area networks offer access to an increasing number
and variety of resources, such as documents, software, data, network
services, and people.  Yet, it is difficult to locate resources of
interest, because of the scale and decentralized nature of the
environment.  We are interested in supporting a global confederation of
loosely cooperating systems and users that share far more resources
than can be completely organized.  Therefore, mechanisms are needed to
support incremental organization of the resources, based on the efforts
of many geographically decentralized individuals, and a range of
different information sources of varying degrees of quality.  In this
paper we describe a prototype implementation of a set of mechanisms
intended to explore this problem in the specific domain of public
Internet archives, accessible via the "anonymous" File Transfer
Protocol.  This is an interesting test case, because it encompasses a
very large scale, administratively decentralized collection of
resources, with considerable practical value.  The resource discovery
paradigm is exploratory in nature, with users contributing to the
global resource space organization as they discover new resources.  At
present, three levels of information quality are supported.  At the
highest level, resources are described using an archive-site-resident
database, with individual resources described according to their
conceptual roles.  Below that, per-user and per-user-site caches are
maintained, to record resources that have been found by individual
users during their explorations.  At the lowest level, the system
monitors announcements of public archive availability from USENET
electronic bulletin board articles, to provide a simple keyword-based
index of resources throughout the global network."