xcaret@csn.org (Xcaret Research) (05/03/91)
Some concerns have been raised in various news groups about the potential Internet load and legal propriety of NetFind, a white pages tool sold and distributed by Xcaret Research, Inc. Xcaret Research appreciates the concern of individuals and organizations who keep network resources from being abused, and we would like to make it clear that we are also concerned about such abuse. In fact, the authors of NetFind were very careful to consider the load imposed by NetFind and conducted a six month study to gather information about about the usage of NetFind and the load imposed on the Internet. In this message we overview NetFind, and then address these concerns. Given the name of a person on the Internet and a rough description of where the person works (such as the name of the institution or the city/state/country in which it is located), NetFind searches for electronic mailbox information about the person. NetFind uses a unique method to actively search the Internet for the person. It does not attempt to keep a database of users across the Internet; such a database would be quite large, difficult to populate completely, and constantly out of date. Instead, NetFind uses the natural database of the Internet itself: it sends multiple parallel requests across the Internet to machines where it suspects the person may reside. The whole process is surprisingly fast, because NetFind sends searches out in parallel. NetFind can locate over 1.4 million people in 2,500 different sites around the world, with response time on the order of 5-30 seconds per search. The primary concern that arose about NetFind was its potential load on the Internet. Clearly, any tool that uses parallel searches to descend from the top of the Domain tree and search each server would be unreasonably costly. NetFind does not do this. The NetFind search procedure uses several mechanisms that significantly limit the scope of searches. First, the user selects at most 3 domains to search (an example of one domain being "colorado.edu"), from the list of domains matching the organization component of the search request. Next, NetFind queries the Domain Naming System to locate authoritative name server hosts for each of these domains. The idea is that these hosts are often on central administrative machines, with accounts and/or mail forwarding information for many users at a site. Each of these machines is then queried using the Simple Mail Transfer Protocol, in an attempt to find mail forwarding information about the specified user. If such information is found, the located machines are then probed using the "finger" protocol, to reveal more detailed information about the person being sought. The results from finger searches can sometimes yield other machines to search as well. A number of mechanisms are used to allow searches to proceed when some of the protocols are not supported on remote hosts. Ten lightweight threads are used to allow sets of DNS/SMTP/finger lookup sequences to proceed in parallel, to increase resilience to host and network failures. The tool enforces a number of other restrictions on the cost of searches, such as the total number of hosts to finger. NetFind began as a research prototype, designed and implemented by Michael Schwartz and Panagiotis Tsirigotis at the University of Colorado. Before becoming a commercial product, the research prototype was deployed at approximately 50 institutions world wide, and extensive measurements were collected over a period of 6 months of use, about the cost of searches, time distribution of searches, etc. The average search uses 136 packets. While this is larger than typical directory services (like X.500), NetFind has significantly larger scope and better timeliness properties than these other services, since it gets information from the sources where people do their daily computing, rather than from auxiliary databases. To put the cost into perspective, it is equivalent to a very short telnet or moderate size FTP session. We estimate that if NetFind were to be used by one hundred people at each site on the Internet where NetFind can find people, it would increase the NSFNET load by approximately 1.4% above its current load of 4 billion packets per month. In comparison, FTP currently accounts for 23% of the NSFNET packets. Moreover, the load generated by NetFind represents the addition of a significant new type of service. Providing new services necessarily will increase network load. A detailed discussion of the research that led to the NetFind product is available in the paper "Experience with a Semantically Cognizant Internet White Pages Directory Tool", Journal of Internetworking: Research and Experience 2, 1 (March, 1991). As for the legal issue: Some people have expressed concern that NetFind represents an inappropriate use of the Internet, because it is commercial software. This is a misinterpretation of network appropriate use policy, which simply regulates the type of traffic that traverses the network (as opposed to the type of software that generates this traffic). There are many pieces of commercial software that generate packets on the Internet, such as Sun's TCP implementation. As with these other pieces of software, appropriate use responsibility rests in the hands of the user. Just as it would be inappropriate to use FTP to transfer commercial data across the Internet, it would be inappropriate to use NetFind for commercial purposes. Yet, there are many appropriate uses for FTP, and for NetFind. If you have further questions about NetFind, please contact: Xcaret Research, Inc. 2060 Broadway, Suite 320 Boulder, CO 80302 (800) 736-1285 netfind@xcaret.com
emv@ox.com (Ed Vielmetti) (05/03/91)
I'll believe all of the quantitative measurements about NetFind being sparing of Internet resources, carefully sending out as few packets as possible and not doing anything stupid. Think of it as an expert system, where the expert modeled is the "expert internet user". From the description of it, I think that an expert internet user like myself could do a better job, though perhaps not as quickly, because I have access to more specialized and better databases than just DNS/SMTP/finger, and more tricky and unobvious ways of looking. My major problem with tools like NetFind is that although they address the "resource discovery" problem for a single user, they don't have any positive side-effects for the rest of the internet. Nothing about NetFind adds to any Internet infrastructure; it doesn't make the problem any easier for the next person down the line or somewhere else who has the same problem. In comparision, the efforts of the various X.500 projects produce something tangible that the rest of the network can consume later. Systems which consume Internet resources and don't have any positive benefits for the rest of the network are Evil and Rude, no matter how small the resources are that they consume. Things which have been placed into this category at various times are email-based archive servers (because of their accidental and heavy loads on transit mail systems), network management by means of pinging random machines, "mail throughput testers" which send mail through a congested system to see how congested the mail system is (!?), and rebroadcasting huge binaries to usenet newsgroups upon the request of one or two people who missed it. A badly implemented NetFind could fall into this category; there's no sign that it actually does. In this particular case, however, since the research has been published, the prospective user of NetFind can look up the algorithms involved and see just how clever the product is before buying. Since most of the ad hoc expert systems for Internet user location haven't been written down, codified, and studied, this is useful information which deserves a closer look. See also latour.colorado.edu:/pub/RD.Papers/White.Pages.ps.Z a preprint of the NetFind paper in the Journal of Internetworking. If I read the paper the right way, users of NetFind are expected to monitor usenet news and store a database of hostname / organization pairs on disk, like the following MH scan would do: scan -format '%{Organization} %{From}' and keep this around for a while (after trimming out user names). Modulo a few goofy things you'll see with people putting their own headers (scan alt.sex.pictures to see that) and bland usenet-internet gateways (see this newsgroup for that) that information's rather good. Keep it for a few months, for the newsgroups you expect to care about, and your ability to find people should be substantially enhanced. -- Msen Edward Vielmetti /|--- moderator, comp.archives emv@msen.com "(6) The Plan shall identify how agencies and departments can collaborate to ... expand efforts to improve, document, and evaluate unclassified public-domain software developed by federally-funded researchers and other software, including federally-funded educational and training software; " High-Performance Computing Act of 1991, S. 218
schwartz@latour.colorado.edu (Mike Schwartz) (05/07/91)
In article <EMV.91May3032230@poe.aa.ox.com> emv@ox.com (Ed Vielmetti) writes: > My major problem with tools like NetFind is that although they address > the "resource discovery" problem for a single user, they don't have any > positive side-effects for the rest of the internet. Nothing about > NetFind adds to any Internet infrastructure; it doesn't make the problem > any easier for the next person down the line or somewhere else who has > the same problem. In comparision, the efforts of the various X.500 > projects produce something tangible that the rest of the network can > consume later. Maybe this is too simplistic of an interpretation, but it seems your argument boils down to the fact that NetFind is basically a client of existing services, rather than a new service in its own right (like X.500). But from a user's perspective, this distinction is irrelevant. What counts is whether the user can find the information they need, how easily, and at what cost to the network. It's true that keeping information in a server would allow that information to be cached for future searches, but I have found that if someone is "reachable" by NetFind, it is usually pretty easy to find them with NetFind. There isn't much need to look at what someone else did to search for that person. As an aside, searching for more general types of resources (like anonymous FTP files) is a harder problem, and the architecture I use for that project does utilize the results of previous users' searches in facilitating future users' searches. I think your objection to a tool only helping one user at the time of use, without contributing to other users by its specific use, is really wrong. If this were the standard against which all software was compared, we would get rid of most of the software in the world. I also think your view of what is "tangible" is biased by your role as moderator of comp.archives. This is a nice contribution to "network infrastructure", but as I see it, generating information collections (which is what both comp.archives and X.500 do, in a general sense) is only one way for users to get and share information. In fact, I believe it makes more sense to search for some types of resources where they naturally reside than it does to build a database about them, since the database needs to be populated and kept up to date. I see at least 3 cases where this can be true: 1. Dynamic, timely data. 2. Data with problems of transfer of authority (i.e., where people may not be willing to relinquish control of their data to relatively centralized administration, like a server per site). 3. Large information spaces of the nature that only a small fraction of the data will ever be needed (and hence the effort to populate a database will not be effectively amortized). Internet white pages fits at least (1), since users move around, and tracking their movements in a database presents administrative problems. I believe it fits (2) and (3) as well. - Mike Schwartz Dept. of Computer Science Univ. of Colorado - Boulder
emv@ox.com (Ed Vielmetti) (05/07/91)
In article <1991May6.173923.174@colorado.edu> schwartz@latour.colorado.edu (Mike Schwartz) writes: > My major problem with tools like NetFind is that although they address > the "resource discovery" problem for a single user, they don't have any > positive side-effects for the rest of the internet. Maybe this is too simplistic of an interpretation, but it seems your argument boils down to the fact that NetFind is basically a client of existing services, rather than a new service in its own right (like X.500). It may be just a matter of terminology; if NetFind were billed as just a souped-up version of finger, then it could be evaluated in the context of being basically a client of other services. But with the claims of it being a "Semantically Cognizant Internet White Pages Directory Tool" with the ability to reach "1,147,000 users in 1,929 administrative domains", when it's mentioned in the same breath as X.500 projects and as an alternative to them, something about it calls for a more critical examination. Just to qualify the numbers -- 1,147,000 reachable users is 1,929 reachable domains, each with an average of 119 hosts (mean based on sample size of 75), with each of those hosts containing a "conservative estimate" of 5 users per host. I don't see any breakdown of success rate by type of domain; notably, the only success numbers I could find (80+% hit rate by day, 70+% by night) don't attempt to measure success to the 40% of the database that's not in the USA. Perhaps there are a million people out there; I'm not convinced of how many people you can find. The performance figures didn't correct for sample bias in the observer; it would be expected that the author would look for people in a field related to his (computer science). Since computer science departments are often those in charge of running the name servers on campus, the particular happy accident of the search algorithm relying on SMTP lookups to the primary name servers may work overly well for CS dept. searches. It is less likely to work well for lookups on people who are more peripheral to the campus network infrastructure. An interesting exercise would be to run NetFind against the names of 10 senior librarians, 10 junior physics faculty members, 10 mathematics graduate students, and 10 undergraduate French majors, suitably scattered about; I have some guesses as to how well your results will turn out. (In truth none the numbers tossed around in the paper are especially convincing; it would have been appropriate to qualify estimated packet counts and user counts with estimated error ranges. It's not possible for me to justify 1,147,000 users any more than 1,146,000 users; a more plausible figure is "on the order of a million users". That's especially true without a good rationale for picking 5 users per host, a figure which appears out of the blue with absolutely no references....) I note that your paper shows (fig 3) that usage of your NetFind prototype tapers off to an average of one use every two weeks. There is no indication from the study whether usage dropped so sharply from the original high average of 7 uses in the first day, or why it drops so far below the estimated 10 searches per week (quoted from RFC 1107). Given the expectation of relatively static communities of interest and the ready availablity of e-mail address information of potential colleagues by alternative access methods (business cards, telephone calls, private mailing lists, netnews) it's not surprising to me that the need for zero-prior-knowlege user lookup information is lower than 1/day. But given that the usage trails off to almost nil after 200 days of use, it would seem to call into question the long-term usability of your product. Have you done any retrospective work on determining why user usage levels dropped to such low levels? ... I have found that if someone is "reachable" by NetFind, it is usually pretty easy to find them with NetFind. That's hard to argue with. But it doesn't yield any insight into what makes people hard to locate, or how to design campus and corporate information systems so that people can easily be found without resorting to extraordinary sleuthing measures. You casually write off (in section 5, Related Work) the efforts of campuses to provide local X.500 services which are accessable via finger; though it's not directly germane to your research, it would have at least been useful to point out that X.500 servers can be deployed within the existing system to good effect. Stick an X.500 system at yourdomain.org with a big pile of user names in it, make it so finger@yourdomain.org does the right thing, and for large institutions like UIUC, UMich, MIT you have a larger problem solved than trying to chase pointers through a domain hierarchy. Granted, the information is somewhat more stale and less likely to be exactly true; but I think it's arguable that zero-knowlege searches are looking more for a pointer to information than an exact match. (e.g. finger vielmetti@umich.edu and you'll get something, but you might have to chase it down a bit to find out from a human that I've moved recently.) As an aside, searching for more general types of resources (like anonymous FTP files) is a harder problem, and the architecture I use for that project does utilize the results of previous users' searches in facilitating future users' searches. Yes, I've read the paper; can't say that it compares with a service like "archie", though, even if the software were available. My reactions to that paper can wait for another message. I'm not impressed with the amount of effort you've spent on seeing how people have really addressed the problem; in particular, your success rates for scanning the net for interesting information are skewed because i'm doing it for you already.... I think your objection to a tool only helping one user at the time of use, without contributing to other users by its specific use, is really wrong. If this were the standard against which all software was compared, we would get rid of most of the software in the world. - Mike Schwartz Dept. of Computer Science Univ. of Colorado - Boulder I think my point is valid. For me to want to let you accomplish a particular task on the Internet (a shared, finite resource) you need to justify to me that it's worth it to let you interpose your packets in the way of my packets on the way to their destination. I will be unwilling to do this unless I'm generous or unless I can see some benefit (or very low cost) from you doing so. Remember that your use of the net is generally going to make my use of the net marginally slower, less convenient, and more risky, unlike e.g. your use of an editor on your local system. That's the story of negative externalities and the "tragedy of the commons"; everyone does a little thing that's convenient for them but which causes the playground to be littered. (e.g. the cutoff of nudie pictures on USA FTP sites causing the saturation of the USA-Finland internet link, and the subsequent barrage of traffic in alt.sex.pictures). Provide me with something useful, a scrap of code I can use or a good idea to work with, and I'll let you go about your business NetFind does seem to pose certain risks to the rest of the net; you could be very efficiently bombarding my slow links on a wild goose chase trying to find someone somewhere else. In truth, I'm sure that the tradeoff is positive, and that I would be quite happy if just one person somewhere used NetFind to find me. A more salient risk is that successful efforts like NetFind would lead people to believe that generating queryable information collections a la X.500 is not necessary in the long run and that we'd be content with ad hoc solutions. [ The paper I'm making references to is ftp'able as latour.colorado.edu:/pub/RD.Papers/White.Pages.ps.Z ] -- Edward Vielmetti, vice president for research, MSEN Inc. emv@msen.com "often those with the power to appoint will be on one side of a controversial issue and find it convenient to use their opponent's momentary stridency as a pretext to squelch them"
kline@ux1.cso.uiuc.edu (Charley Kline) (05/07/91)
emv@ox.com (Ed Vielmetti) writes: > Stick an X.500 system at yourdomain.org with > a big pile of user names in it, make it so finger@yourdomain.org does > the right thing, and for large institutions like UIUC, UMich, MIT you > have a larger problem solved than trying to chase pointers through a > domain hierarchy. Granted, the information is somewhat more stale and > less likely to be exactly true; but I think it's arguable that > zero-knowlege searches are looking more for a pointer to information > than an exact match. (e.g. finger vielmetti@umich.edu and you'll get > something, but you might have to chase it down a bit to find out from > a human that I've moved recently.) Since you mentioned UIUC... We're not an X.500 shop here, but we do have a campus-wide "white pages" service which users can update themselves, and it has an interface to sendmail as well as finger. You can find me with "finger 'charley kline'@uiuc.edu", and you can send mail to people's full names, as in "mail stacy-forsythe@uiuc.edu". People who move can change their own entries, so information is never out of date as long as the person cares enough to keep their information up to date. I think the point is that organizational white-pages databases are already in great supply. I wonder if the "Semantically Cognizant White Pages Service" understands the semantics of the various ones in use. If so, it would make the search for people that much less intensive. ________________________________________________________________________ Charley Kline, KB9FFK, PP-ASEL c-kline@uiuc.edu University of Illinois Computing Services Packet: kb9ffk@w9yh 1304 W. Springfield Ave, Urbana IL 61801 (217) 333-3339
ddean@rain.andrew.cmu.edu (Drew Dean) (05/08/91)
There are some interesting points here. If the person you're trying to find has (a) been on the net for a long time, (b) works for the military or a military contractor, or (c) is the technical or administrative contact for a domain, a whois query to nic.ddn.mil will usually get an answer. But yet even relatively well known people such as Ed Vielmetti aren't in that database. Stanford runs a whois server on stanford.edu that has a campus database in, so that's useful if you know a person is there; as Ed points out UMich, MIT, and UIUC; at CMU finger name@andrew.cmu.edu will do the same thing; if you know they're in CS, finger name@cs.cmu.edu will also work. However, most of the net isn't setup like this, although I'd say it would probably be a good thing. If you know where a person is, (and you're lucky :-)), a nice note to postmaster is another reasonable approach. If not, nslookup and fingering main machines (ie. not every workstation in a cluster, just the fileservers & time-sharing machines) will usually work. For those who are SMTP literate, the VRFY command is also worth trying, although certain SMTP servers don't support it. So if a person is in the NIC database, or you know where they are, you can find them without too much work. The big problem is if neither of these cases apply. Would someone like to donate a machine to run a really big whois database ? Even so, you still have aliasing problem; the current whois database at nic.ddn.mil has 2 "Adams, Rick" entries for example. (It gives email addresses and phone numbers for both, so if you (think) you know where they are, it's easy to get the right one, but in this networked age I might not know where they are -- if I can reach them via the net, who cares ?) The NIC (& CMU) solution of 4 character alphanumeric IDs seems a bit impersonal, at best, although I won't complain because I don't have a better idea....:-) This is the case that NetFind may be good for; I haven't seen it so I can't comment. However, if you don't have a good idea where to start, I don't see how it can avoid traversing the country on (costly) backbones -- which is the problem if a lot of people use it. It seems we're no closer than when we started, but with a machine generating the finger's and VRFY's rather than a person. Sigh.... -- Drew Dean Drew_Dean@rain.andrew.cmu.edu [CMU provides my net connection; they don't necessarily agree with me.]
asp@UUNET.UU.NET (Andrew Partan) (05/12/91)
> From: csn!xcaret (Xcaret Research) > Subject: NetFind and its Internet load > .... NetFind queries the Domain Naming System to locate authoritative name > server hosts for each of these domains. .... > .... Each of these machines > is then queried using the Simple Mail Transfer Protocol .... > .... located machines are then probed using the > "finger" protocol .... I assume that you have thought about domains that are not on the Internet & only have MX records; about nameservers that do not run SMTP servers; about hosts that do not run finger; about firewalls & gateways that do not permit some protocols or hosts to be reached? We run a nameserver for 1000+ domains that are not on the Internet; said nameserver does not run SMTP or finger. We are a MX forwarder for about 900+ domains. I don't want my mail servers hit with more load - at times every cycle counts. --asp@uunet.uu.net (Andrew Partan)
schwartz@latour.colorado.edu (Mike Schwartz) (05/12/91)
In article <9105112005.AA04828@uunet.uu.net> asp@UUNET.UU.NET (Andrew Partan) writes: > I assume that you have thought about domains that are not on the > Internet (...) I don't want my mail servers hit with more load - at > times every cycle counts. NetFind does not probe servers that are in different domains than the institutions being searched. So, if a site has mail forwarding through uunet (for example), NetFind will tell the user the site isn't on the Internet, and not probe that domain further. - Mike