[comp.newprod] NetFind and its Internet Load

xcaret@csn.org (Xcaret Research) (05/03/91)

[I know that this is not a new product announcement.  It appears to be
an attempt to allay some concerns raised by the recent comp.newprod
posting.  As it provides some interesting additional information, I've
made an exception for it.  -mod]

Some concerns have been raised in various news groups about the
potential Internet load and legal propriety of NetFind, a white pages
tool sold and distributed by Xcaret Research, Inc.  Xcaret Research
appreciates the concern of individuals and organizations who keep
network resources from being abused, and we would like to make it clear 
that we are also concerned about such abuse.  In fact, the authors of
NetFind were very careful to consider the load imposed by NetFind and 
conducted a six month study to gather information about about the 
usage of NetFind and the load imposed on the Internet. 

In this message we overview NetFind, and then address these concerns.

Given the name of a person on the Internet and a rough description of
where the person works (such as the name of the institution or the
city/state/country in which it is located), NetFind searches for
electronic mailbox information about the person.  NetFind uses a unique
method to actively search the Internet for the person.  It does not
attempt to keep a database of users across the Internet; such a database
would be quite large, difficult to populate completely, and constantly
out of date.  Instead, NetFind uses the natural database of the Internet
itself: it sends multiple parallel requests across the Internet to
machines where it suspects the person may reside.  The whole process is
surprisingly fast, because NetFind sends searches out in parallel.
NetFind can locate over 1.4 million people in 2,500 different sites
around the world, with response time on the order of 5-30 seconds per
search.

The primary concern that arose about NetFind was its potential load on
the Internet.  Clearly, any tool that uses parallel searches to descend
from the top of the Domain tree and search each server would be
unreasonably costly.  NetFind does not do this.  The NetFind search
procedure uses several mechanisms that significantly limit the scope of
searches.  First, the user selects at most 3 domains to search (an
example of one domain being "colorado.edu"), from the list of domains
matching the organization component of the search request.  Next,
NetFind queries the Domain Naming System to locate authoritative name
server hosts for each of these domains.  The idea is that these hosts
are often on central administrative machines, with accounts and/or mail
forwarding information for many users at a site.  Each of these machines
is then queried using the Simple Mail Transfer Protocol, in an attempt
to find mail forwarding information about the specified user.  If such
information is found, the located machines are then probed using the
"finger" protocol, to reveal more detailed information about the person
being sought.  The results from finger searches can sometimes yield
other machines to search as well.  A number of mechanisms are used to
allow searches to proceed when some of the protocols are not supported
on remote hosts.  Ten lightweight threads are used to allow sets of
DNS/SMTP/finger lookup sequences to proceed in parallel, to increase
resilience to host and network failures.  The tool enforces a number of
other restrictions on the cost of searches, such as the total number of
hosts to finger.

NetFind began as a research prototype, designed and implemented by
Michael Schwartz and Panagiotis Tsirigotis at the University of
Colorado.  Before becoming a commercial product, the research prototype
was deployed at approximately 50 institutions world wide, and extensive
measurements were collected over a period of 6 months of use, about the
cost of searches, time distribution of searches, etc.

The average search uses 136 packets.  While this is larger than typical
directory services (like X.500), NetFind has significantly larger scope
and better timeliness properties than these other services, since it
gets information from the sources where people do their daily computing,
rather than from auxiliary databases.  To put the cost into perspective,
it is equivalent to a very short telnet or moderate size FTP session.

We estimate that if NetFind were to be used by one hundred people at
each site on the Internet where NetFind can find people, it would
increase the NSFNET load by approximately 1.4% above its current load of
4 billion packets per month.  In comparison, FTP currently accounts for
23% of the NSFNET packets.  Moreover, the load generated by NetFind
represents the addition of a significant new type of service.  Providing
new services necessarily will increase network load.

A detailed discussion of the research that led to the NetFind product is
available in the paper "Experience with a Semantically Cognizant
Internet White Pages Directory Tool", Journal of Internetworking:
Research and Experience 2, 1 (March, 1991).

As for the legal issue: Some people have expressed concern that NetFind
represents an inappropriate use of the Internet, because it is
commercial software.  This is a misinterpretation of network appropriate
use policy, which simply regulates the type of traffic that traverses
the network (as opposed to the type of software that generates this
traffic).  There are many pieces of commercial software that generate
packets on the Internet, such as Sun's TCP implementation.  As with
these other pieces of software, appropriate use responsibility rests in
the hands of the user.  Just as it would be inappropriate to use FTP to
transfer commercial data across the Internet, it would be inappropriate
to use NetFind for commercial purposes.  Yet, there are many appropriate
uses for FTP, and for NetFind.

If you have further questions about NetFind, please contact:

	Xcaret Research, Inc.
	2060 Broadway, Suite 320
	Boulder, CO  80302
	(800) 736-1285
	netfind@xcaret.com