bill@twwells.uucp (T. William Wells) (10/25/88)
Contained herein is my first attempt at the database structure which
comp.archives is intended to be the input to. I am also going to
describe the comp.archives postings used to maintain the database.
None of this is cast in stone and critiques are welcome.
Here is the example archive site entry from my previous message.
Following it is a line-by-line description.
NM twwells.UUCP
EN bill@twwells.UUCP (T. William Wells) 1988 Oct 21
AD bill@twwells.UUCP (T. William Wells)
MA 781 W. Oakland Pk Blvd #208, Ft. Lauderdale FL 33311
CO uucp:uucp::twwells Any1800-0800 ACU 2400 13059876543 in:-\r-in: arcuucp
DE This is where comp.archives gets moderated from. I maintain the
DE most up-to-date version of the databases, so if you want
DE them you have to get them directly from me.
NM twwells.UUCP
This is the site name.
EN bill@twwells.UUCP (T. William Wells) 1988 Oct 21
This is the person responsible for the entry and the date on
which the entry was added or updated.
AD bill@twwells.UUCP (T. William Wells)
This is the person who is responsible for the archive. He
may or may not be the uucp, news, or system administrator.
There can be more than one of these.
MA 781 W. Oakland Pk Blvd #208, Ft. Lauderdale FL 33311
The mailing address for help or information. Don't include
this unless you want snail-mail. People who mail to this
address had better include a SASE or e-mail address or forget
about getting any response.
CO uucp:uucp:~:twwells Any1800-0800 ACU 2400 13059876543 in:-\r-in: arcuucp
This contains the information needed to access the archive.
There can be several of these, depending on how many ways
your site can be accessed.
Each line starts with a tag that identifies the access method.
This is used when not all of your archived information is
available through all paths to your site. For example, you
might have a mail based server for small items but require a
direct link for larger things. Each item that you list as
available through your archive has a tag that is used to
indicate which way it can be accessed.
There may be more than one line for a single tag. This would
mean that there is more than one way to get to the same set
of information.
The next field describes the access method. This would be
something like "uucp", or "ftp", or "mail", or whatever.
The remaining fields depend on the access method. Since I am
only familiar with uucp, I am only going to describe the
fields for it. I definitely want input on what is necessary
for other access methods.
There are two fields for uucp access. The first is the path
name which archive file names are relative to. The second is
an L.sys entry that would be used to access your site.
DE This is where comp.archives gets moderated from. I maintain the
DE most up-to-date version of the databases, so if you want
DE them you have to get them direct from me.
This is a short description of your site. You might also
include any special information about your archives; for
example, if you are willing to make tapes you would say so
here.
---
Here is a sample entry for the archived information database. Note
that I made this up from a cursory examination of Pcomm, don't take
it as gospel.
NM unix-pcomm
VR version 1.1
AU egray@fthood.UUCP (Emmet P. Gray)
MA egray@fthood.UUCP (Emmet P. Gray)
EN bill@twwells.UUCP (T. William Wells) 1988 Oct 21
TT public domain version of ProComm (TM)
KW all-source,public-domain,datacomm
SY any:modem,sysv-unix:termcaps,install
DE Pcomm is a public domain telecommunication program for Unix that
DE is designed to operate similar to the MSDOS program, ProComm.
DE ProComm (TM) is copyrighted by Datastorm Technologies, Inc. This
DE is a completely new program and contains no ProComm source code.
DE This is not a Datastorm product.
Here is a line-by-line description:
NM unix-pcomm
The name of the item. If the item is a program that ports to
one environment, the name is that environment hyphenated with
the program name; otherwise it is just the name. Note that
this is not intended to be useful by itself, e.g., unix-pcomm
might eventually also refer to something that has been made
to work under VMS. Should there be two items with the same
name, the later item will have its author's name appended.
For example, should John Turkey later write a pcomm for
UNIX, it would be called unix-pcomm-turkey.
VR version 1.1
Some kind of version stamp. If the item does not have
versions, this is the date released or published, or
something else indicating when the item came into existence.
AU egray@fthood.UUCP (Emmet P. Gray)
This is the person or persons who wrote the thing. If there
is more than one author, use more than one line.
MA egray@fthood.UUCP (Emmet P. Gray)
This is who is maintaining the item. If the item is not
being maintained, don't add this line. If several people are
maintaining it, use several lines. Note that anyone whose
name is on one of these lines can expect e-mail about the
item.
EN bill@twwells.UUCP (T. William Wells) 1988 Oct 21
This is the person responsible for the entry and the date on
which the entry was added or updated.
TT public domain version of ProComm (TM)
A title for the item.
KW all-source,public-domain,datacomm
Keywords describing the item. Note the `all-source' keyword,
which means that all the source (other than that of the tools
mentioned below) needed is included. Note also the
public-domain keyword, which indicates that the item is in
the public domain.
SY any:modem,sysv-unix:termcaps,install
For each system this item runs on (or must be used on),
there should be one of these lines. The fields are:
1) The hardware it runs on. If it runs on any hardware
which a particular OS runs on, the entry is `any'.
Required additional hardware is indicated by
:<hardware>.
2) The OS it runs under. There are several generic names
like the `sysv-unix' above. Optional OS things which are
needed are indicated the same way hardware options are.
Also, software which is not listed in this directory which
is needed to make this go is listed here. Multiple
entries are separated by semicolons. For example, if this
is a Dbase-II program, you'd have MS-DOS;Dbase-II in this
field.
3) How much effort is needed to make it go. If following the
directions is sufficient, the entry is `install'.
4) This entry contains any tools, not normally available on
your system, which one must have in order to build or use
this item. All items which are in this section must also
have their own entries in the information directory.
There may be more than one of these lines, whenever necessary.
DE Pcomm is a public domain telecommunication program for Unix that
DE is designed to operate similar to the MSDOS program, ProComm.
DE ProComm (TM) is copyrighted by Datastorm Technologies, Inc. This
DE is a completely new program and contains no ProComm source code.
DE This is not a Datastorm product.
This is a short descrpiton of the item. This should be kept
brief; putting the man page here is probably not appropriate.
Here is another entry that would go in the information database.
NM free-distribution-database
VR updated continuously
AU bill@twwells.UUCP (T. William Wells)
MA bill@twwells.UUCP (T. William Wells)
EN bill@twwells.UUCP (T. William Wells) 19880926
TT Database of freely distributable, electronically accessible information.
KW database,public-domain
SY any,any,install
DE This database is constructed from the information that passes
DE through comp.archives. It contains information on any software,
DE databases, documents, or what-have-you, that is both freely
DE distributable and available electronically. "Freely
DE distributable" means that, if you have a copy of the item, you
DE can (at least) make exact copies and give them away, and you
DE don't have to tell the owner of the item (if any) that you have
DE done so. "Electronically available" means that it is either
DE accessible through a publicly accessible network, or is available
DE by a means that does not involve paying a fee to the
DE distributor. This information is provided as a free service and
DE there is *no one* guaranteeing that any of it is accurate or
DE useful. Use it your own risk.
---
Here is the meat of the database: the index of things available from
each archive site. This is the format:
archive-name;version;site-name;access-type;access-handle;date;tools;comments
`Archive-name' and `version' match entries in the main
database. If this file is not in the database, leave the
fields blank. Note that this means that you can make
available archive information about things not in the
directory; however, this practiced is discouraged.
`Site-name' is the name of the site, as recorded in the site
database.
`Access-type' is one of the access tags specified in the site
entry. Note that this is in the style of UNIX file names:
wild cards are permitted.
`Access-handle' is used with the information from the site
entry to construct the request from the archive. For
example, using uucp, if the site entry contained
/usr/archives as the path to which files names are relative,
and this field contains foobar.shar, then the path name you
should use to get this item is /usr/archives/foobar.shar.
`Date' is the date which this entry was added to the
database.
`Tools' is a list of programs needed to unarchive the file;
each must be a name in the info database. Standard system
utilities are not listed.
`Comments' is anything useful to add.
For example, suppose I have pcomm sitting around in my directories.
I could have these records:
unix-pcomm;version 1.1;twwells;*;pcomm.1.shar.Z;1988 Oct 21;compress;part 1
unix-pcomm;version 1.1;twwells;*;pcomm.2.shar.Z;1988 Oct 21;compress;part 2
unix-pcomm;version 1.1;twwells;*;pcomm.3.shar.Z;1988 Oct 21;compress;part 3
unix-pcomm;version 1.1;twwells;*;pcomm.4.shar.Z;1988 Oct 21;compress;part 4
unix-pcomm;version 1.1;twwells;*;pcomm.5.shar.Z;1988 Oct 21;compress;part 5
unix-pcomm;version 1.1;twwells;*;pcomm.6.shar.Z;1988 Oct 21;compress;part 6
unix-pcomm;version 1.1;twwells;*;pcomm.7.shar.Z;1988 Oct 21;compress;part 7
unix-pcomm;version 1.1;twwells;*;pcomm.8.shar.Z;1988 Oct 21;compress;part 8
unix-pcomm;version 1.1;twwells;*;pcomm.p1.shar.Z;1988 Oct 21;compress;patch 1
unix-pcomm;version 1.1;twwells;*;pcomm.p2.shar.Z;1988 Oct 21;compress;patch 2
unix-pcomm;version 1.1;twwells;*;pcomm.p3.shar.Z;1988 Oct 21;compress;patch 3
This says that
various pieces of unix-pcomm, version 1.1 are available from my site
they can be accessed through any way that my site can be accessed
the various pieces of it can be accessed with names beginning with pcomm
the entries were added on October 21, 1988
you need compress to unarchive any of it
parts 1-8 and patches 1-5 are available
Now, suppose that I had a list of local BBS's that I was willing
to make available. It would have an entry like:
;;twwells;*;bbslist;2001 Jan 1;;bbs systems in south Florida
This says that the file bbslist is available but that it has no entry
in the information database.
---
That leaves the problem of how to distribute this database. Here
are my goals:
1) To minimize the amount of information retransmitted
through the newsgroup. In an ideal world, the data would
get transmitted once, and everyone would thereafter query
archive sites for current copies.
2) To minimize the delay in getting the information out.
This means avoiding batching the data; it would not be
very nice to hold some archive information just because no
one else was posting at that time.
3) To minimize the pain of maintaining a database from the
information which flows through comp.archives.
The first one is the stickiest problem. If I never retransmitted any
data, sites which want to start a database would have to find someone
who was willing to let them have a copy of the database. Where would
they find this information? This means that I need to, at least,
periodically post a minimal database of sites that are archives for
the database.
Now, how do I best serve the needs of the guy who just has one thing
he is looking for? If I send the data just once, he is unlikely to
see it. The alternative is to send it periodically, with reasonably
long expiration dates, so that he can look on his system.
Anyway, for now, I will do the latter; if the volume gets too high,
then I'll look into some other method.
The second item means posting the information as soon as it comes in
and has been verified. The main drawback to this is that sometimes
the information is incorrectly sent. Putting a delay in the system
results in much of this error being corrected before it gets out. My
own feeling is to make updating the system reasonably painless, so
that if errors like this occur, they can be fixed reasonably easily.
The third item requires minimizing the information transmitted which
is used to update the database (a worthy goal of its own) and
minimizing the programming needed to maintain the database. The
first suggests sending updates as increments: if a site adds or
deletes something, only that addition or deletion gets sent, not the
whole thing. In the interests of keeping the database simple, the
whole database should be maintained in ASCII and be maintainable with
standard UNIX tools. Of course, it would be even better if the tools
needed to maintain this could be found through the database.
----
That leads to the problem of how to maintain the database. First,
the subject line is used to indicate that this is a database update
message. Such subject line starts with the string 'DB:'. This should
make it reasonable to separate these entries from the others. The
remainder of the subject line may be used for any additional comments
I might wish to add.
The body of the message contains the database update commands.
Commands to add data look like:
@ADD <database>
and the following data is what is to be added. <database> is one of
the strings INFO, SITE, or INDEX. The new data is terminated by a
blank line.
Commands to delete data look like:
@DEL <database> <key>
The key depends on what is being deleted. Deletions from the
information database just use the item name. Deletions from the site
database use the site name. Deletions from the archive index use the
site name, the access method, and the access handle for the line to be
deleted.
There is a special command to delete all index entries for a site;
its form is:
@DELALL INDEX <site>
All of this should be reasonably easy to do; I roughed out a shell
script using sed, join, and comm that would handle this; though it
would be SLOW. However, it would be reasonable easy to write a simple
program that would be MUCH faster.
---
Ok, guys, its your turn.
---
Bill
{uunet|novavax}!proxftl!twwells!billgrumpy@edg1 (Eric Schwarz) (10/27/88)
I've got a question and possible problem for you concerning the database format. Is there a reason why you are using different field delimiters for the 3 database entry formats? The site entry uses colons, the information entry uses commas (with colon sub-field delimiters), and the content entry uses semi-colons. The site entry contains a path to the archive files, what about archives that have multiple archive directories? You need to know which files are in which directories. Is putting this information in the content entry going to make it too big (I don't know how many content entries there will be eventually)? Apart from these two items, it looks pretty good. Eric Schwarz uunet!edg1!grumpy
bill@twwells.uucp (T. William Wells) (11/01/88)
In article <275@edg1.UUCP> grumpy@edg1 (Eric Schwarz) writes:
: Is there a reason why you are using different field delimiters for
: the 3 database entry formats? The site entry uses colons, the
: information entry uses commas (with colon sub-field delimiters), and
: the content entry uses semi-colons.
No, other than carelessness. I am changing the format so that
semicolons are the field delimiter, commas are the smallest subfield
delimiter, and colons are the intermediate field delimiter.
: The site entry contains a path to the archive files, what about
: archives that have multiple archive directories? You need to know
: which files are in which directories. Is putting this information
: in the content entry going to make it too big (I don't know how
: many content entries there will be eventually)?
This one reason why there can be more than one CO line. Suppose that
I had stuff in directories /archive/foo and /archive/bar; I could
then have two CO lines:
CO foo.uucp;uucp;/archive/foo;...
CO bar.uucp;uucp;/archive/bar;...
In the content database, things in /archive/foo would have lines like:
prog;vers;mysite;foo*;foo-file;...
and things in /archive/bar would have lines like:
prog;vers;mysite;bar*;bar-file;...
This also makes it easy to tell everyone that the path has changed:
all you do is resubmit the site entry.
---
Bill
{uunet|novavax}!proxftl!twwells!billcomparc@twwells.uucp (comp.archives) (11/11/88)
This is the second attempt at the database structure. Changes are
still possible, so send in any comments you might have.
Here is a short summary of the changes from the previous version:
Lines in the database beginning with # are ignored.
The end of data in a DB: posting is signaled by a line
containing @END.
Everything in a DB: posting before the first line beginning
with an @ is ignored.
The time field in the CO line for ftp access has been changed.
A TT line has been added to the site entry format; it
contains a short title for the archive site.
A TM line has been added to the site entry format; it
specifies the best times to use the archives.
A KW line has been added to the site entry format; it contains
a list of keywords describing the archive. (The original
description said that the keywords are separated by a
semicolon, this is an error: they are separated by commas.)
An IX line has been added to the site entry format; it
contains information about the index files for the archive.
The contents lines have a new field, containing the size of
the file in K.
Some field delimiters have been changed. The CO line now uses
semicolons instead of colons. The SY line now uses semicolons
instead of commas.
---
Comments in the databases begin with a #. They are retained with the
data but are otherwise ignored.
In the line oriented databases, if there is a line that is to be left
blank, that line should still be entered, but with everything but the
keyword left blank.
---
The site database contains a series of entries separated by blank
lines. Each entry has the following lines:
NM <the site name>
EN <who added the entry and when>
TM <best times to call the site>
TT <the name of the archive>
AD <who administers the site>
MA <the administrator's mailing address>
CO <information needed to set up communications with the site>
IX <where the index files>
KW <keywords describing the archive>
DE <description of the site>
Lines from TT to DE may be repeated as a group as often as necessary
to describe different archives at a single site. Each of the lines
from AD to DE may be repeated as often as necessary to contain the
data.
Following is a detailed description of each line.
NM <the site name>
This is a domain name. If you are a uucp site, you should write
this as <site>.uucp.
EN <user>@<site> (<name>) <date>
This says who the person is who entered the database entry. The
<date> is the output from the date command.
TM <time zone>;[[<day>],...<from>-<to> <load>];...
This lets people know when the best times are to use the archive.
The first field is the time zone the archive is contained in; all
times in the entry are presumed to be relative to that time zone.
<Day> is a three letter day abbreviation. The <from> and <to> are
times in 24 hr notation. <Load> is a single word describing the
load on your system at these times, the suggested words are: none,
light, moderate, heavy, swamped.
TT <the name of the archive>
A short title for the archive.
AD <user>@<site> (<name>)
The person who administers the archive. If more than one person
administers the archive, there should be more than one of these.
MA <the administrator's mailing address>
The mailing address for help or information. Leave this blank
unless you want snail-mail. People who mail to this address had
better include a SASE or e-mail address or forget about getting
any response.
CO <access-tag>;ftp;<name>;<internet address>;<directory>;<when available>
CO <access-tag>;uucp;<directory>;<L.sys entry>
This line describes each method of getting at the archive. If there
is more than one way to get at the archive, or more than one
directory containing archive information, then there will be more
than one of these lines.
The <access-tag> is used when not all of your archived information
is available through all paths to your site. For example, you
might have a mail based server for small items but require a
direct link for larger things. Each item that you list as
available through your archive has a tag that is used to indicate
which way it can be accessed.
There may be more than one line for a single tag. This would mean
that there is more than one way to get to the same set of
information.
The next field describes the access method. Right now, it is
either uucp or ftp; more will be added as needed.
The remaining fields depend on the access method.
There are two fields for uucp access. The first is the path name
which archive file names are relative to. The second is an L.sys
entry that would be used to access your site.
For ftp, the fields are the domain name for accessing the archive
(which is normally the same as the site name), the internet
address for the above, the directory where the archive information
resides, and the times when the archive is available.
If the archive is always available, leave that field blank. Otherwise,
format as [[<day>,...<from>-<to>];...
IX <access-tag>;<handle>;<size>;<date>;<tools needed to unarchive>;<comments>
This line describes the index file(s) for the archive. It is the
same format as the entries in the index database, except that the
first three fields are not present.
KW <keyword>,...
This is a list of keywords that describe what the site carries.
DE <description of the site>
This is a few lines that describe the site. This should be kept
reasonable short, but should give any information not specified in
the previous lines that might be useful to the archive user.
---
The archived information database contains a series of entries
separated by blank lines. Each entry has the following lines:
NM <name of the item>
VR <a version number>
AU <the author of the item>
MA <the maintainer of the item>
EN <who entered this into the database>
TT <a title for the item>
KW <keywords for the item>
SY <hardware and software needed for it, and how hard it is to bring it up>
DE <a short description of the item>
Following is a detailed description of each line.
NM <name>
The name of the item. If the item is a program that ports to
one environment, the name is that environment hyphenated with
the program name; otherwise it is just the name. Note that
this is not intended to be useful by itself, e.g., unix-pcomm
might eventually also refer to something that has been made to
work under VMS. Should there be two items with the same name,
the later item will have its author's name appended. For
example, should John Turkey later write a pcomm for UNIX, it
would be called unix-pcomm-turkey.
VR version <version>
VR date <date>
These tell which version this entry refers to. The first form is
used for things with named versions, the second is used for
something which is regularly updated. The date, for the second
format, is yymmdd, and specifies the date the thing was last
updated. Some things are so continuously updated that they
should not have a version; for them, leave this line blank.
AU <user>@<site> (<name>)
This is the person or persons who wrote the thing. If there
is more than one author, use more than one line.
MA <user>@<site> (<name>)
This is who is maintaining the item. If the item is not being
maintained, leave this blank. If several people are
maintaining it, use several lines. Note that anyone whose name
is on one of these lines can expect e-mail about the item.
EN <user>@<site> (<name>) <date>
This is the person responsible for the entry and the date on
which the entry was added or updated.
TT <title>
A title for the item.
KW <keyword>,...
Keywords describing the item. Some good kinds of keywords:
`all-source', which means that all the source (other than that
of the tools mentioned below) needed is included;
`public-domain', which indicates that the item is in the public
domain.
SY <hardware>[:<hw add-ons},...];<software>[:<sw add-ons},...];
<effort-needed>;<tools-needed>
For each system this item runs on (or must be used on), there
should be one of these lines. The fields are:
1) The hardware it runs on. If it runs on any hardware which a
particular OS runs on, the entry is `any'. If the item
needs hardware other than the standard for the system, add
words for it after a colon.
2) The OS it runs under. There are several generic names like
`unix' or `sysv-unix'. Optional OS things which are needed
are indicated the same way hardware options are. Also,
software which is not listed in this database which is
needed to make this item go is listed here. For example,
were this item to be a Dbase program, this field would be:
MS-DOS:Dbase-II.
3) How much effort is needed to make it go. If following the
directions is sufficient, the entry is `install'.
4) This entry contains any tools, not normally available on
your system, which one must have in order to build or use
this item. All items which are in this section must also
have their own entries in the information directory.
DE <some text>
This is a short descrpiton of the item. This should be kept
brief; putting the man page here is not appropriate.
Here is an entry, suitable for the databases created through
comp.archives.
NM free-distribution-database
VR
AU bill@twwells.UUCP (T. William Wells)
MA bill@twwells.UUCP (T. William Wells)
EN bill@twwells.UUCP (T. William Wells) Fri Nov 11 00:56:16 EST 1988
TT Database of freely distributable, electronically accessible information.
KW database,public-domain
SY any;any;;
DE This database is constructed from the information that passes
DE through comp.archives. It contains information on any software,
DE databases, documents, or what-have-you, that is both freely
DE distributable and available electronically. "Freely
DE distributable" means that, if you have a copy of the item, you
DE can (at least) make exact copies and give them away, and you
DE don't have to tell the owner of the item (if any) that you have
DE done so. "Electronically available" means that it is either
DE accessible through a publicly accessible network, or is available
DE by a means that does not involve paying a fee to the
DE distributor. This information is provided as a free service and
DE there is *no one* guaranteeing that any of it is accurate or
DE useful. Use it your own risk.
---
The site index ties the previous two databases together.
This is the format:
<name>;<version>;<archive>;<access-tag>;<handle>;<size>;
<date>;<tools>;<comments>
The first two fields link this entry to an entry in the info
database; they correspond to the NM and VR fields. If this
file is not listed in the database, these fields are blank.
`Site-name' is the name of the site, as recorded in the site
database.
`Access-type' is one of the access tags specified in the site
entry. Note that this is in the style of UNIX file names:
wild cards are permitted.
`handle' is used with the information from the site entry to
construct the request from the archive. For example, using
uucp, if the site entry contained /usr/archives as the path
to which files names are relative, and this field contains
foobar.shar, then the path name you should use to get this
item is /usr/archives/foobar.shar.
`Date' is the date which this entry was added to the database.
This should be yymmdd.
`Tools' is a list of programs needed to unarchive the file;
each must be a name in the info database. Standard system
utilities are not listed.
`Comments' is anything useful to add.
---
The DB: postings contain information to update the database. The
update information starts with the first line beginning with an @ and
ends with a line containing @END. Additional information, not
intended to be part of the database can be added before the first @
line or after the @END line.
Commands to add data look like:
@ADD <database>
and the following data is what is to be added. <database> is one of
the strings INFO, SITE, or INDEX. The new data is terminated by a
blank line. This blank line is required, no matter what the next
command is.
Commands to delete data look like:
@DEL <database> <key>
The key depends on what is being deleted. Deletions from the
information database just use the item name. Deletions from the site
database use the site name. Deletions from the archive index use the
site name, the access method, and the access handle for the line to be
deleted.
There is a special command to delete all index entries for a site;
its form is:
@DELALL INDEX <site>
---
Bill
{uunet|novavax}!proxftl!twwells!bill
send comp.archives postings to twwells!comp-archives
send comp.archives related mail to twwells!comp-archives-requestcomparc@twwells.uucp (comp.archives) (12/01/88)
This is the third attempt at the database structure. Changes are
still possible, so send in any comments you might have.
Here is a short summary of the changes from the previous version:
1) The definition of a site is somewhat vague. What I am going to
do is to consider one set of archives under the control of a
single administrator as an archive site. This means that the
site entry won't have different sets of data for archives
located at the same site. This also means that the archive
name will be somewhat less related to the address of the
archive.
2) The access method and access tag of the CO fields have been
swapped. The access method now comes first.
---
Comments in the databases begin with a #. They are retained with the
data but are otherwise ignored.
In the line oriented databases, if there is a line that is to be left
blank, that line should still be entered, but with everything but the
keyword left blank.
---
The site database contains a series of entries separated by blank
lines. Each entry has the following lines:
NM <the site name>
EN <who added the entry and when>
TM <best times to call the site>
TT <the name of the archive>
AD <who administers the site>
MA <the administrator's mailing address>
CO <information needed to set up communications with the site>
IX <where the index files>
KW <keywords describing the archive>
DE <description of the site>
Each of the lines from AD to DE may be repeated as often as necessary
to contain the data.
Following is a detailed description of each line.
NM <the site name>
This name should be related to the address used to find the site,
though it doesn't have to. This should be kept fairly short.
EN <user>@<site> (<name>) <date>
This says who the person is who entered the database entry. The
<date> is the output from the date command.
TM <time zone>;[[<day>],...<from>-<to> <load>];...
This lets people know when the best times are to use the archive.
The first field is the time zone the archive is contained in; all
times in the site entry are presumed to be relative to that time
zone. <Day> is a three letter day abbreviation. The <from> and
<to> are times in 24 hr notation. <Load> is a single word
describing the load on your system at these times, the suggested
words are: none, light, moderate, heavy, swamped, best, worst.
TT <the name of the archive>
A short title for the archive.
AD <user>@<site> (<name>)
The person who administers the archive. If more than one person
administers the archive, there should be more than one of these.
MA <the administrator's mailing address>
The mailing address for help or information. Leave this blank
unless you want snail-mail. People who mail to this address had
better include a SASE or e-mail address or forget about getting
any response.
CO ftp;<access-tag>;<name>;<internet address>;<directory>;<when available>
CO uucp;<access-tag>;<directory>;<L.sys entry>
This line describes each method of getting at the archive. If there
is more than one way to get at the archive, or more than one
directory containing archive information, then there will be more
than one of these lines.
The <access-tag> is used when not all of your archived information
is available through all paths to your site. For example, you
might have a mail based server for small items but require a
direct link for larger things. Each item that you list as
available through your archive has a tag that is used to indicate
which way it can be accessed.
There may be more than one line for a single tag. This would mean
that there is more than one way to get to the same set of
information.
The next field describes the access method. Right now, it is
either uucp or ftp; more will be added as needed.
The remaining fields depend on the access method.
There are two fields for uucp access. The first is the path name
which archive file names are relative to. The second is an L.sys
entry that would be used to access your site.
For ftp, the fields are the domain name for accessing the archive,
the internet address for the above, the directory where the
archive information resides, and the times when the archive is
available.
If the archive is always available, leave that field blank.
Otherwise, format as [[<day>,...<from>-<to>];...
IX <access-tag>;<handle>;<size>;<date>;<tools needed to unarchive>;<comments>
This line describes the index file(s) for the archive. It is the
same format as the entries in the index database, except that the
first three fields are not present. You should also list README
files and the like.
KW <keyword>,...
This is a list of keywords that describe what the site carries.
DE <description of the site>
This is a few lines that describe the site. This should be kept
reasonable short, but should give any information not specified in
the previous lines that might be useful to the archive user.
---
The archived information database contains a series of entries
separated by blank lines. Each entry has the following lines:
NM <name of the item>
VR <a version number>
AU <the author of the item>
MA <the maintainer of the item>
EN <who entered this into the database>
TT <a title for the item>
KW <keywords for the item>
SY <hardware and software needed for it, and how hard it is to bring it up>
DE <a short description of the item>
Following is a detailed description of each line.
NM <name>
The name of the item. If the item is a program that runs in
one environment, the name is that environment hyphenated with
the program name; otherwise it is just the name. Note that
this is not intended to be useful by itself, e.g., unix-pcomm
might eventually also refer to something that has been made to
work under VMS. Should there be two items with the same name,
the later item will have its author's name appended. For
example, should John Turkey later write a pcomm for UNIX, it
would be called unix-pcomm-turkey.
VR version <version>
VR date <date>
These tell which version this entry refers to. The first form is
used for things with named versions, the second is used for
something which is regularly updated. The date, for the second
format, is yymmdd, and specifies the date the thing was last
updated. Some things are so continuously updated that they
should not have a version; for them, leave this line blank.
AU <user>@<site> (<name>)
This is the person or persons who wrote the thing. If there
is more than one author, use more than one line.
MA <user>@<site> (<name>)
This is who is maintaining the item. If the item is not being
maintained, leave this blank. If several people are
maintaining it, use several lines. Note that anyone whose name
is on one of these lines can expect e-mail about the item.
EN <user>@<site> (<name>) <date>
This is the person responsible for the entry and the date on
which the entry was added or updated.
TT <title>
A title for the item.
KW <keyword>,...
Keywords describing the item. Some good kinds of keywords:
`all-source', which means that all the source (other than that
of the tools mentioned below) needed is included;
`public-domain', which indicates that the item is in the public
domain.
SY <hardware>[:<hw add-ons},...];<software>[:<sw add-ons},...];
<effort-needed>;<tools-needed>
For each system this item runs on (or must be used on), there
should be one of these lines. The fields are:
1) The hardware it runs on. If it runs on any hardware which a
particular OS runs on, the entry is `any'. If the item
needs hardware other than the standard for the system, add
words for it after a colon.
2) The OS it runs under. There are several generic names like
`unix' or `sysv-unix'. Optional OS things which are needed
are indicated the same way hardware options are. Also,
software which is not listed in this database which is
needed to make this item go is listed here. For example,
were this item to be a Dbase program, this field would be:
MS-DOS:Dbase-II.
3) How much effort is needed to make it go. If following the
directions is sufficient, the entry is `install'.
4) This entry contains any tools, not normally available on
your system, which one must have in order to build or use
this item. All items which are in this section must also
have their own entries in the information directory.
DE <some text>
This is a short descrpiton of the item. This should be kept
brief; putting the man page here is not appropriate.
Here is an entry, suitable for the databases created through
comp.archives.
NM free-distribution-database
VR
AU bill@twwells.UUCP (T. William Wells)
MA bill@twwells.UUCP (T. William Wells)
EN bill@twwells.UUCP (T. William Wells) Fri Nov 11 00:56:16 EST 1988
TT Database of freely distributable, electronically accessible information.
KW database,public-domain
SY any;any;;
DE This database is constructed from the information that passes
DE through comp.archives. It contains information on any software,
DE databases, documents, or what-have-you, that is both freely
DE distributable and available electronically. "Freely
DE distributable" means that, if you have a copy of the item, you
DE can (at least) make exact copies and give them away, and you
DE don't have to tell the owner of the item (if any) that you have
DE done so. "Electronically available" means that it is either
DE accessible through a publicly accessible network, or is available
DE by a means that does not involve paying a fee to the
DE distributor. This information is provided as a free service and
DE there is *no one* guaranteeing that any of it is accurate or
DE useful. Use it your own risk.
---
The site index ties the previous two databases together. This is the
format:
<name>;<version>;<archive>;<access-tag>;<handle>;<size>;
<date>;<tools>;<comments>
The first two fields link this entry to an entry in the info
database; they correspond to the NM and VR fields. If this
file is not listed in the database, these fields are blank.
`Site-name' is the name of the site, as recorded in the site
database.
`Access-type' is one of the access tags specified in the site
entry. Note that this is in the style of UNIX file names:
wild cards are permitted.
`handle' is used with the information from the site entry to
construct the request from the archive. For example, using
uucp, if the site entry contained /usr/archives as the path
to which files names are relative, and this field contains
foobar.shar, then the path name you should use to get this
item is /usr/archives/foobar.shar.
`Date' is the date which this entry was added to the database.
This should be yymmdd.
`Tools' is a list of programs needed to unarchive the file;
each must be a name in the info database. Standard system
utilities are not listed.
`Comments' is anything useful to add.
---
The DB: postings contain information to update the database. The
update information starts with the first line beginning with an @ and
ends with a line containing @END. Additional information, not
intended to be part of the database can be added before the first @
line or after the @END line.
Commands to add data look like:
@ADD <database>
and the following data is what is to be added. <database> is one of
the strings INFO, SITE, or INDEX. The new data is terminated by a
blank line. This blank line is required, no matter what the next
command is.
Commands to delete data look like:
@DEL <database> <key>
The key depends on what is being deleted. Deletions from the
information database just use the item name. Deletions from the site
database use the site name. Deletions from the archive index use the
site name, the access method, and the access handle for the line to be
deleted.
There is a special command to delete all index entries for a site;
its form is:
@DELALL INDEX <site>
---
Bill
{uunet|novavax}!proxftl!twwells!bill
send comp.archives postings to twwells!comp-archives
send comp.archives related mail to twwells!comp-archives-requestcomparc@twwells.uucp (comp.archives) (01/03/89)
Here is a short summary of the changes from the previous version:
1) Two new access methods have been added, one for fidonet and
for BBS's.
2) All file sizes should be in K; this was not stated in the
previous version.
3) Text on the DE lines should be kept to less than 70
characters; this makes life easier for pretty-printing the
archive information.
4) Lines that have fields separated by semicolons should have all
the semicolons on the line, including trailing ones. This was
not specified in the previous version.
5) The key separator on @DEL lines is a semicolon. This was not
specified in the previous version.
---
Comments in the databases begin with a #. They are retained with the
data but are otherwise ignored.
In the line oriented databases, if there is a line that is to be left
blank, that line should still be entered, but with everything but the
keyword left blank.
Lines that have fields separated by semicolons should have all
semicolons on the line, including trailing ones.
---
The site database contains a series of entries separated by blank
lines. Each entry has the following lines:
NM <the site name>
EN <who added the entry and when>
TM <best times to call the site>
TT <the name of the archive>
AD <who administers the site>
MA <the administrator's mailing address>
CO <information needed to set up communications with the site>
IX <where the index files>
KW <keywords describing the archive>
DE <description of the site>
Each of the lines from AD to DE may be repeated as often as necessary
to contain the data.
Following is a detailed description of each line.
NM <the site name>
This name should be related to the address used to find the site,
though it doesn't have to. This should be kept fairly short.
EN <user>@<site> (<name>) <date>
This says who the person is who entered the database entry. The
<date> is the output from the date command.
TM <time zone>;[[<day>],...<from>-<to> <load>];...
This lets people know when the best times are to use the archive.
The first field is the time zone the archive is contained in; all
times in the site entry are presumed to be relative to that time
zone. <Day> is a three letter day abbreviation. The <from> and
<to> are times in 24 hour notation. <Load> is a single word
describing the load on your system at these times, the suggested
words are: none, light, moderate, heavy, swamped, best, worst.
TT <the name of the archive>
A short title for the archive.
AD <user>@<site> (<name>)
The person who administers the archive. If more than one person
administers the archive, there should be more than one of these.
MA <the administrator's mailing address>
The mailing address for help or information. Leave this blank
unless you want snail-mail. People who mail to this address had
better include a SASE or e-mail address or forget about getting
any response.
CO ftp;<access tag>;<name>;<internet address>;<directory>;<when available>
CO uucp;<access tag>;<directory>;<L.sys entry>
CO fido;<access tag>;<access-info>
CO bbs;<access tag>;<phone>;<when available>;<modem settings>;
<protocols supported>;<comments>
This line describes each method of getting at the archive. If there
is more than one way to get at the archive, or more than one
directory containing archive information, then there will be more
than one of these lines.
The <access tag> is used when not all of your archived information
is available through all paths to your site. Suppose that you had
two archives, one of small programs that you had a mail-based
server for, and another of larger stuff that you want to transfer
only through uucp. Your CO line for mail access could have an
access tag of `mail' and your CO line for uucp access could have
an access tag of `uucp'.
Files which are available only through mail would have an access
tag of `mail'. Files available only through uucp would have an
access tag of `uucp'. Files that were available either way would
have an access tag of `*'.
There may be more than one line for a single tag. This would mean
that there is more than one way to get to the same set of
information.
The next field describes the access method. Right now, it is one
of uucp, ftp, fido, or bbs; more will be added as needed.
The remaining fields depend on the access method.
There are two fields for uucp access. The first is the path name
which archive file names are relative to. The second is an L.sys
entry that would be used to access your site.
For ftp, the fields are the domain name for accessing the archive,
the internet address for the above, the directory where the
archive information resides, and the times when the archive is
available.
If the archive is always available, leave that field blank.
Otherwise, format as [[<day>,...<from>-<to>];...
There is one field for fidonet. This is some information needed
for accessing the archive, as yet I have no idea what this info is.
There are five fields for BBS access. The first is the phone
number; if you want them, use hyphens for digit separators. The
second field indicates when the BBS is available; leave it blank
if it is always available. The modem settings are a comma
separated list of entries like: <data bits><parity><stop
bits>:<speed>. Parity is represented by one of the letters: (N)o,
(E)ven, (O)dd, (M)ark, (S)pace. The protocols suppoerted field
indicates which protocols are available for file transfer. The
final field is for additional comments about getting into the BBS.
IX <access tag>;<handle>;<size>;<date>;<tools needed to unarchive>;<comments>
This line describes the index file(s) for the archive. It is the
same format as the entries in the index database, except that the
first three fields are not present. You should also list README
files and the like. Note that the file size should be in K's.
KW <keyword>,...
This is a list of keywords that describe what the site carries.
DE <description of the site>
This is a few lines that describe the site. This should be kept
reasonably short, but should give any information not specified in
the previous lines that might be useful to the archive user. The
text on these lines should be kept to less than 70 characters.
---
The archived information database contains a series of entries
separated by blank lines. Each entry has the following lines:
NM <name of the item>
VR <a version number>
AU <the author of the item>
MA <the maintainer of the item>
EN <who entered this into the database>
TT <a title for the item>
KW <keywords for the item>
SY <hardware and software needed for it, and how hard it is to bring it up>
DE <a short description of the item>
Following is a detailed description of each line.
NM <name>
The name of the item. If the item is a program that runs in
one environment, the name is that environment hyphenated with
the program name; otherwise it is just the name. Note that
this is not intended to be useful by itself, e.g., unix-pcomm
might eventually also refer to something that has been made to
work under VMS. Should there be two items with the same name,
the later item will have its author's name appended. For
example, should John Turkey later write a pcomm for UNIX, it
would be called unix-pcomm-turkey.
VR version <version>
VR date <date>
These tell which version this entry refers to. The first form is
used for things with named versions, the second is used for
something which is regularly updated. The date, for the second
format, is yymmdd, and specifies the date the thing was last
updated. Some things are so continuously updated that they
should not have a version; for them, leave this line blank.
AU <user>@<site> (<name>)
This is the person or persons who wrote the thing. If there
is more than one author, use more than one line.
MA <user>@<site> (<name>)
This is who is maintaining the item. If the item is not being
maintained, leave this blank. If several people are
maintaining it, use several lines. Note that anyone whose name
is on one of these lines can expect e-mail about the item.
EN <user>@<site> (<name>) <date>
This is the person responsible for the entry and the date on
which the entry was added or updated.
TT <title>
A title for the item.
KW <keyword>,...
Keywords describing the item. Some good kinds of keywords:
`all-source', which means that all the source (other than that
of the tools mentioned below) needed is included;
`public-domain', which indicates that the item is in the public
domain.
SY <hardware>[:<hw add-ons},...];<software>[:<sw add-ons},...];
<effort-needed>;<tools-needed>
For each system this item runs on (or must be used on), there
should be one of these lines. The fields are:
1) The hardware it runs on. If it runs on any hardware which a
particular OS runs on, the entry is `any'. If the item
needs hardware other than the standard for the system, add
words for it after a colon.
2) The OS it runs under. There are several generic names like
`unix' or `sysv-unix'. Optional OS things which are needed
are indicated the same way hardware options are. Also,
software which is not listed in this database which is
needed to make this item go is listed here. For example,
were this item to be a Dbase program, this field would be:
MS-DOS:Dbase-II.
3) How much effort is needed to make it go. If following the
directions is sufficient, the entry is `install'.
4) This entry contains any tools, not normally available on
your system, which one must have in order to build or use
this item. All items which are in this section must also
have their own entries in the information directory.
DE <some text>
This is a short descrpiton of the item. This should be kept
brief; putting the man page here is not appropriate. The text
on these lines should be kept to less than 70 characters.
Here is an entry, suitable for the databases created through
comp.archives.
NM free-distribution-database
VR
AU bill@twwells.UUCP (T. William Wells)
MA bill@twwells.UUCP (T. William Wells)
EN bill@twwells.UUCP (T. William Wells) Fri Nov 11 00:56:16 EST 1988
TT Database of freely distributable, electronically accessible information.
KW database,public-domain
SY any;any;;
DE This database is constructed from the information that passes
DE through comp.archives. It contains information on any software,
DE databases, documents, or what-have-you, that is both freely
DE distributable and available electronically. "Freely
DE distributable" means that, if you have a copy of the item, you
DE can (at least) make exact copies and give them away, and you
DE don't have to tell the owner of the item (if any) that you have
DE done so. "Electronically available" means that it is either
DE accessible through a publicly accessible network, or is available
DE by a means that does not involve paying a fee to the
DE distributor. This information is provided as a free service and
DE there is *no one* guaranteeing that any of it is accurate or
DE useful. Use it your own risk.
---
The site index ties the previous two databases together. This is the
format:
<name>;<version>;<archive>;<access tag>;<handle>;<size>;
<date>;<tools>;<comments>
The first two fields link this entry to an entry in the info
database; they correspond to the NM and VR fields. If this
file is not listed in the database, these fields are blank.
`Site-name' is the name of the site, as recorded in the site
database.
`Access-type' is one of the access tags specified in the site
entry. Note that this is in the style of UNIX file names:
wild cards are permitted.
`handle' is used with the information from the site entry to
construct the request from the archive. For example, using
uucp, if the site entry contained /usr/archives as the path
to which files names are relative, and this field contains
foobar.shar, then the path name you should use to get this
item is /usr/archives/foobar.shar.
`Size' is the size of the file, in K.
`Date' is the date which this entry was added to the database.
This should be yymmdd.
`Tools' is a list of programs needed to unarchive the file;
each must be a name in the info database. Standard system
utilities are not listed.
`Comments' is anything useful to add.
---
The DB: postings contain information to update the database. The
update information starts with the first line beginning with an @ and
ends with a line containing @END. Additional information, not
intended to be part of the database, can be added before the first @
line or after the @END line.
Commands to add data look like:
@ADD <database>
and the following data is what is to be added. <database> is one of
the strings INFO, SITE, or INDEX. The new data is terminated by a
blank line. This blank line is required, no matter what the next
command is.
Commands to delete data look like:
@DEL <database> <key>
The key depends on what is being deleted. Deletions from the
information database just use the item name. Deletions from the site
database use the site name. Deletions from the archive index use the
site name, the access method, and the access handle for the line to be
deleted. Semicolons are used to separate the key fields.
There is a special command to delete all index entries for a site;
its form is:
@DELALL INDEX <site>
---
Bill
{uunet|novavax}!proxftl!twwells!bill
send comp.archives postings to twwells!comp-archives
send comp.archives related mail to twwells!comp-archives-request