[comp.archives] [wanted...] archie V2.0 man page

bajan@cs.mcgill.ca (Alan Emtage) (03/23/91)

Archive-name: ftp/database/archie-doc/1991-03-22
Archive-directory: quiche.cs.mcgill.ca:/archie/doc/ [132.206.2.3]
Original-posting-by: bajan@cs.mcgill.ca (Alan Emtage)
Original-subject: archie V2.0 man page (NON *roff version)
Reposted-by: emv@msen.com (Edward Vielmetti, MSEN)


The following is a (currently correct) version of the man page for
archie V2.0. Since archie is currently still under development, revisions
will be posted from time to time. Any typos are to be blamed on me :-)
Please send any corrections to

        archie-admin@cs.mcgill.ca

This can also be obtained from quiche.cs.mcgill.ca (132.206.2.3) in the
~ftp/archie/doc directory as archie.man.txt

-Alan



---8<----------8<----------8<----------8<----------8<----------8<----------8<

ARCHIE(1L)        MISC. REFERENCE MANUAL PAGES         ARCHIE(1L)


NAME
     archie - an Internet archive server listing service

SYNOPSIS
     archie

DESCRIPTION
     The archie system is a program which can  query  a  database
     maintained  by  the  Computer  Science  Department of McGill
     University.  The database contains a list of software  which
     is available by means of anonymous ftp(1) to hosts connected
     to the Internet network.

     The system can be accessed in an interactive fashion or  via
     electronic  mail  (email). In order use the interactive sys-
     tem:

     1)   Connect to  host  quiche.cs.mcgill.ca  (132.206.2.3  or
          132.206.51.1) with telnet(1).

     2)   Login  as  user  archie  (no  capitals,   no   password
          required).   The  system  prints  a  banner message and
          status report.

     3)   Type ``help'' for further information.

     In order to use the email interface, send requests to

               archie@cs.mcgill.ca

     Send the word ``help'' in a message for  available  commands
     and  features.  Please note that this is an automated inter-
     face: no human sees it. See "THE  EMAIL  INTERFACE"  section
     below.

     Comments and suggestions should be sent to

               archie-l@cs.mcgill.ca

     Adimistrative requests such as adding a site to the database
     or  modifying  the  Software  Description Database should be
     sent to

               archie-admin@cs.mcgill.ca

THE INTERACTIVE INTERFACE
     Variables

     archie has a number of variables which modify its  behavior.
     The  values  of these variables may be changed using the set
     command.  archie distinguishes between three types of  vari-
     able:

     boolean
          which may be either set or unset.

     numeric
          representing an integer within a pre-determined range.

     string
          whose value is a string of characters (which may or may
          not be restricted).

     The following variables are currently recognized

     autologout

          By default, archie will exit after  one  hour  of  idle
          time.   This value can be changed though this variable,
          which represents in minutes, the length  of  idle  time
          before you are automatically logged out.

          The  minimum  and  maximum  values  are  1   and   300,
          representing one minute through five hours.

          Example:

             set autologout 45

          will cause you to be automatically logged out after  45
          minutes of idle time.


     mailto

          A string variable whose value is  a  mail  address,  or
          comma-separated list of addresses. Note that there must
          not be any spaces within the list of addresses. If this
          is  set  and  the  mail command is issued with no argu-
          ments, then the output of the last command is mailed to
          that address.

          Example:

             set mailto user@frobozz.com

          Example:

             set mailto user1@hello.edu,user2@goodbye.com

          All the various Internet addressing styles  are  under-
          stood. BITNET sites should use the convention

             user@sitename.bitnet

          UUCP addresses can be specified as

              user@sitename.uucp

     maxhits

          A numeric variable whose value is the maximum number of
          matches you want the prog command to generate.

          If archie seems to be slow, or you don't want a lot  of
          output  this  can be set to a small value.  ``maxhits''
          must be within the range 0 to 1000.  The default  value
          is 1000.

          Example:

             set maxhits 100

          prog will now stop after 100 matches have been found

     pager

          A boolean variable which, when  set,  tells  archie  to
          filter  all  output  through  the pager less(1L).  When
          using the pager you may also want to set the term vari-
          able to your terminal type (see term variable).

          Example:

             set pager

     search

          This variable determines the kind of  search  performed
          on  the  database by the prog command, providing flexi-
          bilty on search times and types.


          search is a string variable whose value is one  of  the
          following:

          sub

               Substring (case insensitive). A  simple,  everyday
               substring  search.  A match occurs if the the file
               (or directory) name in the database  contains  the
               user-given substring.

               Example:

                    The pattern ``is'' will  match  ``islington''
               and ``this'' and      ``poison''

          subcase

               Substring (case sensitive). As above but the  case
               of the strings involved becomes significant.

               Example:

                   ``TeX'' will match ``LaTeX'' but not ``Latex''
               or ``TExTroff''.

          exact

               Exact match. The fastest  search  method  of  all.
               The restriction is that the user string (the argu-
               ment to the prog command)  has  to  exactly  match
               (including  case) the string in the database. This
               is provided for those of who who  know  just  what
               you are looking for.

               For example, if you wanted to know where  all  the
               ``xlock.tar.Z''  files  were,  this is the kind of
               search to use.

          regex

               This is the default search method.   Searches  the
               database  with  the  user (search) string which is
               given in the form of an ed(1) regular expression.

               NOTE: Unless specifically anchored to  the  begin-
               ning  (with  ^)  or  end (with $) of a line, ed(1)
               regular  expressions  (effectively)  have   ``.*''
               prepended and appended to them. For example, it is
               not necessary to say

                    prog .*xnlock.*

               since

                    prog xnlock

               will suffice. Thus the regex match becomes a  sim-
               ple substring match.

     sortby

          This variable describes how the output  from  the  prog
          command  is  to be ordered. It can have one of 5 values
          (and their associated reverse orders). For each method,
          the  ``natural''  sort order (or at least, what we con-
          sider to be the natural order) is the default.

          hostname

               Output is sorted on the archive hostname in  lexi-
               cal order.

               Reverse order rhostname

          time

               Output is sorted with the most recent  modifcation
               times  of  the  found  file/directory names coming
               first (youngest -> oldest).

               Reverse order rtime

          size

               Output  is  sorted  by  the  size  of  the   found
               files/directories, largest first.

               Reverse order rsize

          filename

               Sorted in file/directory name lexical order.

               Reverse order rfilename

          none

               This is the DEFAULT order.

               Unsorted. There is no reverse order although rnone
               is accepted for symmetry.

          Typing the keyboard interrupt  character  (  Ctl-C  for
          most  people  on  UNIX)  during a search will cause the
          search to aborted. The results up to that time will  be
          sorted (determined by the value of the sortby variable)
          and the results output. The output phase may itself  be
          aborted by typing the abort character a second time.

     status

          This boolean variable  determines  if  the  status-line
          will  be  displayed while the prog command is searching
          through the database. If  set  (which  is  the  default
          value) then the number of matches and percentage of the
          database searched is displayed. Otherwise no output  is
          given until the search is complete.

     term  This variable tells archie what type of  terminal  you
          are using, and optionally its size in rows and columns.
          This information is used by the pager.

          The usage is:

             set term <terminal-type> [<#rows> [<#columns>]]

          That is, the terminal type is required, but the  number
          of  rows  and  columns  is optional.  You may specify a
          value for rows only, but if  you  want  to  change  the
          number  of  columns you must give a value for both rows
          and columns.  The default values for rows  and  columns
          are 24 and 80.

          Examples:

             set term vt100

             set term xterm 60

             set term xterm 24 100



     Regular Expressions

          archie uses ed(1) regular expressions in  a  number  of
          commands.

          A regular expression, on the one hand, is a string like
          any  other;  a  sequence  of  characters.  On the other
          hand, special characters within the string have certain
          functions  which  make  regular expressions useful when
          trying to match portions of other strings.  In the fol-
          lowing  discussion  and examples, a string containing a
          regular expression will be called the ``pattern'',  and
          the  string against which it is to be matched is called
          the ``reference string''.

          Regular expressions  allow  one  to  search  for  ``all
          strings ending with the letters ize
           '' or ``all strings beginning with a number between  1
          and 3 and ending in a comma''.

          In order to accomplish this, regular expressions co-opt
          the  use  of  some  characters to have special meaning.
          They also provide for these characters  to  lose  their
          special  meaning  if the user so desires. The rules for
          regular expresssion are


     c    Any character c  matches  itself  unless  it  has  been
          assigned  other  special  meaning as listed below. Most
          special characters can be escaped  (made  to  lose  its
          special meaning), by placing the character '\' in front
          of it. This doesn't apply to '{' which  is  non-special
          until  it  is  escaped.  Thus although '*' normally has
          special meaning the string '\*' matches itself.

          Example:

          The pattern

               acdef

          matches

               s83acdeffff or acdefsecs or acdefsecs

          but not

               accdef or aacde1f

          That is it will any string that contains ``acdef'' any-
          where in the reference string.

          Example:

               Normally the characters '*'  and '$' are  special,
          but the pattern

               a\*bse\$

          acts as above. That is any reference string  containing
          ``*abse$'' as a substring will be flagged as a match.



     .     A period matches  any  character  except  the  newline
          character. This is known as the wildcard character.

          Example:

               The pattern

                ....

          will match any 4 characters in  the  reference  string,
          except a newline character.


     ^    If `^' appears at the begining of the pattern  then  it
          is said to ``anchor'' the match to the beginning of the
          line. That is, the reference string must start with the
          pattern  following  the  `^'. If this character appears
          anywhere else other than at the beginning of the  line,
          then  it  is  no longer considered special, and matches
          itself as any non-special character would. Similarly if
          it starts a string but is escaped, it matches itself.

          Example:

          The pattern

               ^efghi

          Will match

               efghi or efghijlk

          but not

               abcefghi

          That is the pattern will  match  only  those  reference
          strings  starting  with  ``efghi''. Just containing the
          substring is not sufficient.


     $     Occurring at the end of the  pattern,  this  character
          ``anchors''  the pattern to the end of the line (refer-
          ence string). A '$' occurring anywhere else in the pat-
          tern  is  regarded as a non-special. Similarly if it is
          at the end of the pattern but is escaped,  it  is  non-
          special.

          Example:

          The pattern

               efghi$

          Will match

               efghi or abcdefghi

          but not

               efghijkl

          That is the pattern will  match  only  those  reference
          strings ending with ``efghi''. Just containing the sub-
          string is not sufficient.


     \<    This sequence in the pattern causes the one  character
          regular expression following it only to match something
          at the beginning of a word: the beginning of a line  or
          just  before a letter, digit or underline character, or
          just after a charcter which is not one of these.

          Example:

               The pattern

               \<abc

          would match the last 'abc' in the reference string

               @hijabc#+abc

          but not the first since the first 'abc' did  not  start
          on a ``word'' boundary.


     \>    Constrains the one-character regular  expression  fol-
          lowing  it  to  be  at the end of a ``word'' as defined
          above.


     [string]

          One or more characters within  square  brackets.   This
          pattern  matches any single character within the brack-
          ets. The caret, '^', has a special meaning if it is the
          first  character  in the series: the pattern will match
          any character other than one in the list.

          Example:

               The pattern

               [^abc]

          Will match any character except 'a', 'b' or 'c'.

          To match a right bracket, ']', in the list it  must  be
          put first:

               []ab01]

          For a caret, '^', in the list it  can  appear  anywhere
          but first.

          In

               [ab^01]

          the caret loses its special meaning.


          The '-' character is special within square brackets. It
          is  interpreted  as a range of characters (in the ASCII
          character set) and  will  match  any  single  character
          within  that  range.   '[a-z]'  matches  any lower case
          letter. The '-' can be made non special by  placing  it
          first or last within the square brackets.


          The characters '$', '*' and '.' are not special  within
          square brackets.


          Example:

               The pattern

               [ab01]

          matches a single occurence of a character from the  set
          'a', 'b', '0', '1'.

          Example:

               The pattern

               [^ab01]

          will match any single character other  than  'a',  'b',
          '0', '1'.


          Example :

               The pattern

               [a0-9b]

          which matches one of 'a', 'b' or a digit between 0  and
          9 inclusive.

          Example :

               The pattern

               [^a0-9b.$]


          means any single character not 'a', 'b' '.' , '$' or  a
          digit between 0 and 9 inclusive.

     *     An asterisk following a regular expression in the pat-
          tern   has   the   effect  of  matching  zero  or  more
          occurrences of that expression.

          Example:

               The pattern

               a*

          means zero or more occurrences of the character 'a'.


          Example:

               The pattern

               [A-Z]*

          means zero or more occurrences of the upper case alpha-
          bet.




     \{m\}

     \{m,\}

     \{m,n\}

          A one-character regular expression followed by  one  of
          the  three  of  these  constructions  causes a range of
          occurrences of that regular expression to  be  matched.
          If  it  is  followed by \{m\} where m is a non-negative
          integer between 0 and 255 (inclusive), then  exactly  m
          occurrences  of that regular expression are matched. If
          followed by \{m,\}, then at  least  m  occurrences  are
          matched.   Finally, if it is followed by \{m,n\} (where
          n is a non-negative integer between 0 and 255 and where
          n > m), then between m and n occurrences of the expres-
          sion are matched.

          Example:

               The pattern

               ab\{3\}

          would match any substring in the reference string of an
          'a' followed by exactly 3 'b's.

          Example:

               The pattern

               ab\{3,\}

          would match any substring in the reference string of an
          'a' followed by at least 3 'b's.


          Example:

               The pattern

               ab\{3,5\}

          would match any substring in the reference string of an
          'a' followed by at least 3 but at most 5 'b's.


          Common Problems with Regular Expression


     (1)  When matching a substring it is not  necessary  to  use
          the  wildcard character to match the part of the refer-
          ence string preceeding and following the substring.

          Example:

               The pattern

               abcd

          will match any reference string  containing  this  pat-
          tern. It is not necessary to use

                .*abcd.*

          as the pattern.


     (2)  In order to constrain a pattern to the entire reference
          pattern, use the the construction:

               ^pattern$


     (3)  The easiest way to obtain case insensitivity in a regu-
          lar  expression  is to use the '[]' operator. For exam-
          ple, a pattern to match the word ``hello'' regarless of
          the case of the letters would be:

               [Hh][Ee][Ll][Ll][Oo]


     Commands

          Arguments to commands shown  here  in  square  brackets
          '[]' are optional. All others are mandatory.  help List
          the valid archie commands.

     list [pattern]
          This command provides a list  of  the  sites  currently
          stored  in the database and the time at which they were
          last updated.  There is an optional regular  expression
          argument to limit the list to specific sites.

          Note that the numerical (IP) address associated with  a
          site  name  is valid at the listed time, but since they
          do  occasionally  change,  it  is   possible   that   a
          discrepancy may occur until that site is updated in our
          database. Furthermore, the listed  IP  address  is  the
          primary,  as  listed  in  the  DNS  database: secondary
          addresses are not stored.

          Example:

               list

          will list all sites in the database, while

               list \.de$

          lists all German sites.

     mail [address1,[address2...]]
          With an argument (or arguments) the output of the  last
          command  is  mailed  to the specified address or comma-
          separated list of addresses.   No  spaces  must  appear
          anywhere in the address list.

          Example:

               mail user1@hello.edu,user2@goodbye.com

          Without an argument the output of the last  command  is
          sent to the address specified in the mailto variable.

          Example:

               mail

          All the various Internet addressing styles  are  under-
          stood. BITNET sites should use the convention

               user@sitename.bitnet

          UUCP address can be specified as

               user@sitename.uucp

     prog pattern
          Find all occurrences of programs  with  names  matching
          pattern.  How  pattern  is  interpreted  depends on the
          value of the search variable.   The  output  lists  the
          names  of  hosts with matching entries, the size of the
          matching program, its last modification  date  and  its
          path.

          The results are sorted according to the  value  of  ths
          sortby  variable, and are limited in number by the max-
          hits variable.

     set variable-name
          This command allows you to set one  of  archie's  vari-
          ables.   Their  values affect how archie interacts with
          the user.

          boolean variables are either set or unset

          Example:

               set pager

          numeric variables take a number within a certain range

          Example:

               set maxhits 500

          string variables take a  (possibly  restricted)  string
          value

          Example:

               set sortby time


          See entries on unset and show .



     show [variable-name]
          This command is  used to display the value of a partic-
          ular  variable,  or  all variables. With an argument it
          will display the value of  that  variable,  without  an
          argument it will display the value of all variables.

          Example:

             show maxhits

     site sitename
          This command allows you to get a  full  listing  of  an
          ftp(1)  site in the archie database.  The output format
          is similar to that of UNIX ls(1) long  recursive  (-lR)
          listing.

          Example:

             site col.hp.com

     unset variable
          This causes the specified variable to  have  no  value.
          This  means that it will not be used by archie until it
          has been given a value with the set command.

          Note: this may cause ``counter-intuitive'' behaviour in
          some  cases  (e.g.  in the case of maxhits ).  Although
          one might expect prog to print matches  without  regard
          for  any  limit, this is not the case.  If the value of
          maxhits is not available it will merely  fall  back  to
          some internal default.

     whatis substring
          This  command searches the archie Software  Description
          Database  for  the  given  substring,  with  case being
          ignored. This database  consists  of  names  and  short
          descriptions  of  many  of the software packages, docu-
          ments (like RFCs and  educational  material)  and  data
          files that are stored on the Internet.

          Example:

             whatis uucp

          in part gives as a result:

               findpath.sh             UUCP Pathfinder
               logfile-stats           UUCP LOGFILE analyzer
               mapstats                UUCP map statistics program

          We welcome and encourage additions and  corrections  to
          this  database  and depend on the archie user community
          to keep it uptodate. To make your contribution to  this
          database, mail to


                    archie-admin@cs.mcgill.ca

          For new additions, please keep the  description  to  25
          words or less.


THE EMAIL INTERFACE
     The archie email interface currently accepts a limited  sub-
     set of the interactive interface commands, plus a few of its
     own. Currently variables are  not  supported  in  the  email
     interface.


     Requests to this server should be addressed to

                    archie@cs.mcgill.ca

     Note that the ``Subject:'' line in  incoming  mail  is  pro-
     cessed  as if it were part of the main message body. No spe-
     cial keywords are required.

     Note that the help command is exclusive. All other  commands
     in the same message are ignored.

     The server recognizes the following commands. If  a  message
     not  containing  any  valid  requests or an empty message is
     received, it will be considered to be a help request.


     path path
          This lets the requestor override the address that would
          normally  be  extracted from the header.  If you do not
          hear from the archive server within a couple  of  hours
          might  consider  adding a path command to your request.
          The  path  describes  how  to  mail  a   message   from
          cs.mcgill.ca  to  your  address.  cs.mcgill.ca is fully
          connected to the Internet.


          BITNET users can use the convention

               user@site.bitnet

          UUCP user can use the convention

               user@site.uucp


     help Will send you a message describing how to use the email
          interface (basically this section).


     prog <reg expr1> [<reg exp2> ...]

          A search of the archie database is performed with  each
          <reg exp> (a regular expression as defined by ed(1)) in
          turn, and any matches found are returned to the reques-
          tor.  Note that multiple <reg exp> may be placed on one
          line, in which case the results will be mailed back  to
          you  in  one message.  If you have multiple prog lines,
          then multiple messages will be returned, one  for  each
          line  [This  doesn't  work as expected at the moment...
          stay tuned].

          Any regular expression containing spaces must be quoted
          with  single  (') or double (") quotes. ALL OTHER ed(1)
          rules must be followed.

          NOTE: The searches are CASE SENSITIVE. The  ability  to
          change this will hopefully be added soon.

          The prog command is currently executed as if the search
          variable were set to regex.


     site <site name> | <site IP address>

          A listing of the given <site name>  will  be  returned.
          The  fully  qualified  domain name or IP address may be
          used.


     compress

          ALL of your files in the current mail message will  run
          through  compress(1)  and uuencode(1). When you receive
          the reply, remove everything before the ``begin''  line
          and run it through uudecode(1).  This will produce a .Z
          file. You can then run uncompress(1) on this  file  and
          get the results of your request.



     quit Nothing past this point is interpreted.  This  is  pro-
          vided  so that the occasional lost soul whose signature
          contains a line that looks like a command can still use
          the server without getting a bogus response.



THE ARCHIE DATABASE
     The archie database subsystem maintains a list of about  600
     Internet  ftp(1)  archive  sites.   Each night, the database
     subsystem executes an anonymous ftp(1) to a subset of  these
     sites  and  fetches a recursive directory listing (or a file
     containing the recursive directory listing if this  exists).

     Currently,  each  site  gets  updated  approximately  once a
     month.    The   directory    listings    are    stored    on
     quiche.cs.mcgill.ca  (132.206.2.3), where they are available
     to the Internet community via anonymous ftp(1).  They appear
     in the directory ~ftp/archie/listings in compressed form.

BUGS
     1)   Only UNIX sites are included in the database.

     2)   The user can not limit searches to specific sites.

     3)   There is no graphical user interface.

     4)   There is no way to abort the help facility completely.

     It is hoped that all these will change in coming versions.


LONG TERM PLANS
     The archie system is regarded as  being  ``in  development''
     and  is not being released to outside sites at present.  The
     current database requires about 70 MB of disk  storage,  and
     the  updates  and  searches put a noticeable load on the Sun
     4/280 on which it operating.  Eventually, we hope to distri-
     bute archie to several sites around the world.

     We welcome comments and suggestions;  please  send  them  to
     archie-l@cs.mcgill.ca.

SEE ALSO
     ftp(1), telnet(1)

AUTHORS
     Alan Emtage (bajan@cs.mcgill.ca), McGill University.

     Bill Heelan (wheelan@cs.mcgill.ca), McGill University.


     Manual page by R. P. C. Rodgers, UCSF  School  of  Pharmacy,
     San           Francisco,           California          94143
     (rodgers@maxwell.mmwb.ucsf.edu) and Alan Emtage.