[comp.sources.wanted] Announcing "Archie 1.0": The Archive Server Server

bajan@opus.cs.mcgill.ca (Alan Emtage) (11/15/90)

 McGill School of Computer Science Operating "archie"
 ------------------------------------------------------
 - An Internet  Archive Server Listing Service
 ----------------------------------------------
 
 Given the number of hosts being used as archive sites nowadays, there can
 be great difficulty in finding needed software in a distributed
 environment. You may know that the software that you need is out there,
 but it can sometimes be difficult to find.  The School of Computer
 Science at McGill University has one solution to the problem - "archie".
 

 Getting To The Point:
 ---------------------
 
 So how do you get to use archie? If you are Internet connected, it's
 easy. Telnet to quiche.cs.mcgill.ca (132.206.2.3 or 132.206.51.1) and
 login as user "archie". You should get a banner message and status
 report on our latest additions (there's no password, although we do log
 the sessions to provide rudimentary stats). "help" gets a list of valid
 commands. Feedback welcome and can be sent to archie@cs.mcgill.ca
 (please see list of current limitations below).

 Please note that at the time of writing the database has not been fully
 built: there are still some 50 (out of over 600) sites to be entered.

 Long version for those who are interested......
 

 What is archie ?
 ----------------

 Archie is a pair of software tools. the first maintains a list of about
 600 Internet ftp archive sites.  Each night software executes an
 anonymous ftp to a subset of these sites and fetches a recursive
 directory listing of each.  We hit about 1/30th of the list each time,
 so each site gets hit about once a month, hopefully balancing timely
 updates against unnecessary network load.  The listings are stored on
 one of our machines quiche.cs.mcgill.ca (132.206.2.3) where they are
 made available to the entire Internet community via anonymous ftp 
 in the directory ~ftp/archie/listings in compressed form.
 
 The second tool is the interesting one as far as the users are
 concerned. It consists of a program running on a dummy user code that
 allows outsiders to log onto the archive host to query the database.
 This is in fact the program we call archie.

 We have just finished implementing a database scheme, which both greatly
 speeds lookup and reduces storage requirements. 

 Users can ask archie to search for specific name strings.  For example,
 "prog kcl" would find all occurences of the string "kcl" and tell you
 which hosts have entries with this string, the size of the program, its
 last modification date and where it can be found on the host along with
 some other useful information. In this example, you could thus find
 those archive sites that are storing Kyoto Common Lisp. With one central
 database for all the archive sites we know about, archie greatly speeds
 the task of finding a specific program on the net.
 
 Archie is currently running in "proof of concept" mode, but already it
 has proved its popularity (we are getting up to 50 queries a day and the
 number is increasing daily as word spreads). We still consider archie to
 be in "beta" and there are a number of features to be added before it 
 becomes fully operational.
 
 At present archie has two useful commands. The first ("prog") lets you
 specify a search string (using "ed" regular expressions for wildcard
 matching).  The second ("site") lets you ask for a site listing by name.
 In this case, it lists all programs at the given site, in effect
 reconstructing the original recursive listing. There is also a
 rudimentary help, but not much more. 


 Limitations and future plans
 ----------------------------

 These are limitations that we plan to change in the near to the 
 not-so-near future.

 - The regular expressions are CASE SENSITIVE. We need to be able
   to turn this off if desired. (near)

 - Only UNIX sites are included in the database. We don't yet have a
   parser for VMS or VM sites (near)

 - Exact match lookups (as opposed to regular expressions) are much faster
   given the current database architecture. However the user interface
   currently doesn't allow access to this feature (near)

 - Don't provide a list of sites currently maintained (near)

 - Cannot limit searches to specific sites... hopefully using regular
   expressions to specify the site names (eg "prog *kcl *.de" will find
   the kcl sites from among the German archive sites) (near)

 - Retrieval scripts are brain-damaged (near) 

 - No email interface (hopefully near)

 - No X11 interface (not quite so near)


 One may now also use a pager on the output so that things don't go flying
 past you as they did before.
 
 Currently we are not releasing the software which provides this service:
 there still remains much work to be done on it and it is non-portable at
 the moment. The aim was to get archie up and running and see what kind of
 response we got before sending it out to the world at large. Even though
 the database is now about half the size of the version of archie that we
 have been running for the past couple of months... it still occupies
 about 40 Mb and the updates and searches can still put a noticable load
 on a Sun 4/280 which is doing little else.
 
 Other things? We hope to distribute archie to a few key sites around the
 world. A few sites in Europe and North America have volunteered, and
 having a few servers around would lessen the backbone load and ensure
 reachability.  Once we get a bit more code in place we plan to do this. A
 GUI interface (for both Xwindows and maybe NeXTstep) would be nice. 
 
 We welcome feedback, comments and suggestions from both satisfied and
 unsatisfied customers and this can be sent to archie@cs.mcgill.ca.


-----------------------------------------------------------------------------
Alan Emtage,                    "Ashore it's wine, women and song;
McGill University,CANADA         abord it rum, bum and concertina"
					-19th Century British Naval Saying

INTERNET: bajan@cs.mcgill.ca    UUCP: ...!mit-eddie!musocs!bajan
	  listmaster@cs.mcgill.ca
BITNET:	  bajan@musocs.BITNET
-----------------------------------------------------------------------------