bajan@opus.cs.mcgill.ca (Alan Emtage) (11/15/90)
McGill School of Computer Science Operating "archie" ------------------------------------------------------ - An Internet Archive Server Listing Service ---------------------------------------------- Given the number of hosts being used as archive sites nowadays, there can be great difficulty in finding needed software in a distributed environment. You may know that the software that you need is out there, but it can sometimes be difficult to find. The School of Computer Science at McGill University has one solution to the problem - "archie". Getting To The Point: --------------------- So how do you get to use archie? If you are Internet connected, it's easy. Telnet to quiche.cs.mcgill.ca (132.206.2.3 or 132.206.51.1) and login as user "archie". You should get a banner message and status report on our latest additions (there's no password, although we do log the sessions to provide rudimentary stats). "help" gets a list of valid commands. Feedback welcome and can be sent to archie@cs.mcgill.ca (please see list of current limitations below). Please note that at the time of writing the database has not been fully built: there are still some 50 (out of over 600) sites to be entered. Long version for those who are interested...... What is archie ? ---------------- Archie is a pair of software tools. the first maintains a list of about 600 Internet ftp archive sites. Each night software executes an anonymous ftp to a subset of these sites and fetches a recursive directory listing of each. We hit about 1/30th of the list each time, so each site gets hit about once a month, hopefully balancing timely updates against unnecessary network load. The listings are stored on one of our machines quiche.cs.mcgill.ca (132.206.2.3) where they are made available to the entire Internet community via anonymous ftp in the directory ~ftp/archie/listings in compressed form. The second tool is the interesting one as far as the users are concerned. It consists of a program running on a dummy user code that allows outsiders to log onto the archive host to query the database. This is in fact the program we call archie. We have just finished implementing a database scheme, which both greatly speeds lookup and reduces storage requirements. Users can ask archie to search for specific name strings. For example, "prog kcl" would find all occurences of the string "kcl" and tell you which hosts have entries with this string, the size of the program, its last modification date and where it can be found on the host along with some other useful information. In this example, you could thus find those archive sites that are storing Kyoto Common Lisp. With one central database for all the archive sites we know about, archie greatly speeds the task of finding a specific program on the net. Archie is currently running in "proof of concept" mode, but already it has proved its popularity (we are getting up to 50 queries a day and the number is increasing daily as word spreads). We still consider archie to be in "beta" and there are a number of features to be added before it becomes fully operational. At present archie has two useful commands. The first ("prog") lets you specify a search string (using "ed" regular expressions for wildcard matching). The second ("site") lets you ask for a site listing by name. In this case, it lists all programs at the given site, in effect reconstructing the original recursive listing. There is also a rudimentary help, but not much more. Limitations and future plans ---------------------------- These are limitations that we plan to change in the near to the not-so-near future. - The regular expressions are CASE SENSITIVE. We need to be able to turn this off if desired. (near) - Only UNIX sites are included in the database. We don't yet have a parser for VMS or VM sites (near) - Exact match lookups (as opposed to regular expressions) are much faster given the current database architecture. However the user interface currently doesn't allow access to this feature (near) - Don't provide a list of sites currently maintained (near) - Cannot limit searches to specific sites... hopefully using regular expressions to specify the site names (eg "prog *kcl *.de" will find the kcl sites from among the German archive sites) (near) - Retrieval scripts are brain-damaged (near) - No email interface (hopefully near) - No X11 interface (not quite so near) One may now also use a pager on the output so that things don't go flying past you as they did before. Currently we are not releasing the software which provides this service: there still remains much work to be done on it and it is non-portable at the moment. The aim was to get archie up and running and see what kind of response we got before sending it out to the world at large. Even though the database is now about half the size of the version of archie that we have been running for the past couple of months... it still occupies about 40 Mb and the updates and searches can still put a noticable load on a Sun 4/280 which is doing little else. Other things? We hope to distribute archie to a few key sites around the world. A few sites in Europe and North America have volunteered, and having a few servers around would lessen the backbone load and ensure reachability. Once we get a bit more code in place we plan to do this. A GUI interface (for both Xwindows and maybe NeXTstep) would be nice. We welcome feedback, comments and suggestions from both satisfied and unsatisfied customers and this can be sent to archie@cs.mcgill.ca. ----------------------------------------------------------------------------- Alan Emtage, "Ashore it's wine, women and song; McGill University,CANADA abord it rum, bum and concertina" -19th Century British Naval Saying INTERNET: bajan@cs.mcgill.ca UUCP: ...!mit-eddie!musocs!bajan listmaster@cs.mcgill.ca BITNET: bajan@musocs.BITNET -----------------------------------------------------------------------------