bill@twwells.uucp (T. William Wells) (10/25/88)
Contained herein is my first attempt at the database structure which comp.archives is intended to be the input to. I am also going to describe the comp.archives postings used to maintain the database. None of this is cast in stone and critiques are welcome. Here is the example archive site entry from my previous message. Following it is a line-by-line description. NM twwells.UUCP EN bill@twwells.UUCP (T. William Wells) 1988 Oct 21 AD bill@twwells.UUCP (T. William Wells) MA 781 W. Oakland Pk Blvd #208, Ft. Lauderdale FL 33311 CO uucp:uucp::twwells Any1800-0800 ACU 2400 13059876543 in:-\r-in: arcuucp DE This is where comp.archives gets moderated from. I maintain the DE most up-to-date version of the databases, so if you want DE them you have to get them directly from me. NM twwells.UUCP This is the site name. EN bill@twwells.UUCP (T. William Wells) 1988 Oct 21 This is the person responsible for the entry and the date on which the entry was added or updated. AD bill@twwells.UUCP (T. William Wells) This is the person who is responsible for the archive. He may or may not be the uucp, news, or system administrator. There can be more than one of these. MA 781 W. Oakland Pk Blvd #208, Ft. Lauderdale FL 33311 The mailing address for help or information. Don't include this unless you want snail-mail. People who mail to this address had better include a SASE or e-mail address or forget about getting any response. CO uucp:uucp:~:twwells Any1800-0800 ACU 2400 13059876543 in:-\r-in: arcuucp This contains the information needed to access the archive. There can be several of these, depending on how many ways your site can be accessed. Each line starts with a tag that identifies the access method. This is used when not all of your archived information is available through all paths to your site. For example, you might have a mail based server for small items but require a direct link for larger things. Each item that you list as available through your archive has a tag that is used to indicate which way it can be accessed. There may be more than one line for a single tag. This would mean that there is more than one way to get to the same set of information. The next field describes the access method. This would be something like "uucp", or "ftp", or "mail", or whatever. The remaining fields depend on the access method. Since I am only familiar with uucp, I am only going to describe the fields for it. I definitely want input on what is necessary for other access methods. There are two fields for uucp access. The first is the path name which archive file names are relative to. The second is an L.sys entry that would be used to access your site. DE This is where comp.archives gets moderated from. I maintain the DE most up-to-date version of the databases, so if you want DE them you have to get them direct from me. This is a short description of your site. You might also include any special information about your archives; for example, if you are willing to make tapes you would say so here. --- Here is a sample entry for the archived information database. Note that I made this up from a cursory examination of Pcomm, don't take it as gospel. NM unix-pcomm VR version 1.1 AU egray@fthood.UUCP (Emmet P. Gray) MA egray@fthood.UUCP (Emmet P. Gray) EN bill@twwells.UUCP (T. William Wells) 1988 Oct 21 TT public domain version of ProComm (TM) KW all-source,public-domain,datacomm SY any:modem,sysv-unix:termcaps,install DE Pcomm is a public domain telecommunication program for Unix that DE is designed to operate similar to the MSDOS program, ProComm. DE ProComm (TM) is copyrighted by Datastorm Technologies, Inc. This DE is a completely new program and contains no ProComm source code. DE This is not a Datastorm product. Here is a line-by-line description: NM unix-pcomm The name of the item. If the item is a program that ports to one environment, the name is that environment hyphenated with the program name; otherwise it is just the name. Note that this is not intended to be useful by itself, e.g., unix-pcomm might eventually also refer to something that has been made to work under VMS. Should there be two items with the same name, the later item will have its author's name appended. For example, should John Turkey later write a pcomm for UNIX, it would be called unix-pcomm-turkey. VR version 1.1 Some kind of version stamp. If the item does not have versions, this is the date released or published, or something else indicating when the item came into existence. AU egray@fthood.UUCP (Emmet P. Gray) This is the person or persons who wrote the thing. If there is more than one author, use more than one line. MA egray@fthood.UUCP (Emmet P. Gray) This is who is maintaining the item. If the item is not being maintained, don't add this line. If several people are maintaining it, use several lines. Note that anyone whose name is on one of these lines can expect e-mail about the item. EN bill@twwells.UUCP (T. William Wells) 1988 Oct 21 This is the person responsible for the entry and the date on which the entry was added or updated. TT public domain version of ProComm (TM) A title for the item. KW all-source,public-domain,datacomm Keywords describing the item. Note the `all-source' keyword, which means that all the source (other than that of the tools mentioned below) needed is included. Note also the public-domain keyword, which indicates that the item is in the public domain. SY any:modem,sysv-unix:termcaps,install For each system this item runs on (or must be used on), there should be one of these lines. The fields are: 1) The hardware it runs on. If it runs on any hardware which a particular OS runs on, the entry is `any'. Required additional hardware is indicated by :<hardware>. 2) The OS it runs under. There are several generic names like the `sysv-unix' above. Optional OS things which are needed are indicated the same way hardware options are. Also, software which is not listed in this directory which is needed to make this go is listed here. Multiple entries are separated by semicolons. For example, if this is a Dbase-II program, you'd have MS-DOS;Dbase-II in this field. 3) How much effort is needed to make it go. If following the directions is sufficient, the entry is `install'. 4) This entry contains any tools, not normally available on your system, which one must have in order to build or use this item. All items which are in this section must also have their own entries in the information directory. There may be more than one of these lines, whenever necessary. DE Pcomm is a public domain telecommunication program for Unix that DE is designed to operate similar to the MSDOS program, ProComm. DE ProComm (TM) is copyrighted by Datastorm Technologies, Inc. This DE is a completely new program and contains no ProComm source code. DE This is not a Datastorm product. This is a short descrpiton of the item. This should be kept brief; putting the man page here is probably not appropriate. Here is another entry that would go in the information database. NM free-distribution-database VR updated continuously AU bill@twwells.UUCP (T. William Wells) MA bill@twwells.UUCP (T. William Wells) EN bill@twwells.UUCP (T. William Wells) 19880926 TT Database of freely distributable, electronically accessible information. KW database,public-domain SY any,any,install DE This database is constructed from the information that passes DE through comp.archives. It contains information on any software, DE databases, documents, or what-have-you, that is both freely DE distributable and available electronically. "Freely DE distributable" means that, if you have a copy of the item, you DE can (at least) make exact copies and give them away, and you DE don't have to tell the owner of the item (if any) that you have DE done so. "Electronically available" means that it is either DE accessible through a publicly accessible network, or is available DE by a means that does not involve paying a fee to the DE distributor. This information is provided as a free service and DE there is *no one* guaranteeing that any of it is accurate or DE useful. Use it your own risk. --- Here is the meat of the database: the index of things available from each archive site. This is the format: archive-name;version;site-name;access-type;access-handle;date;tools;comments `Archive-name' and `version' match entries in the main database. If this file is not in the database, leave the fields blank. Note that this means that you can make available archive information about things not in the directory; however, this practiced is discouraged. `Site-name' is the name of the site, as recorded in the site database. `Access-type' is one of the access tags specified in the site entry. Note that this is in the style of UNIX file names: wild cards are permitted. `Access-handle' is used with the information from the site entry to construct the request from the archive. For example, using uucp, if the site entry contained /usr/archives as the path to which files names are relative, and this field contains foobar.shar, then the path name you should use to get this item is /usr/archives/foobar.shar. `Date' is the date which this entry was added to the database. `Tools' is a list of programs needed to unarchive the file; each must be a name in the info database. Standard system utilities are not listed. `Comments' is anything useful to add. For example, suppose I have pcomm sitting around in my directories. I could have these records: unix-pcomm;version 1.1;twwells;*;pcomm.1.shar.Z;1988 Oct 21;compress;part 1 unix-pcomm;version 1.1;twwells;*;pcomm.2.shar.Z;1988 Oct 21;compress;part 2 unix-pcomm;version 1.1;twwells;*;pcomm.3.shar.Z;1988 Oct 21;compress;part 3 unix-pcomm;version 1.1;twwells;*;pcomm.4.shar.Z;1988 Oct 21;compress;part 4 unix-pcomm;version 1.1;twwells;*;pcomm.5.shar.Z;1988 Oct 21;compress;part 5 unix-pcomm;version 1.1;twwells;*;pcomm.6.shar.Z;1988 Oct 21;compress;part 6 unix-pcomm;version 1.1;twwells;*;pcomm.7.shar.Z;1988 Oct 21;compress;part 7 unix-pcomm;version 1.1;twwells;*;pcomm.8.shar.Z;1988 Oct 21;compress;part 8 unix-pcomm;version 1.1;twwells;*;pcomm.p1.shar.Z;1988 Oct 21;compress;patch 1 unix-pcomm;version 1.1;twwells;*;pcomm.p2.shar.Z;1988 Oct 21;compress;patch 2 unix-pcomm;version 1.1;twwells;*;pcomm.p3.shar.Z;1988 Oct 21;compress;patch 3 This says that various pieces of unix-pcomm, version 1.1 are available from my site they can be accessed through any way that my site can be accessed the various pieces of it can be accessed with names beginning with pcomm the entries were added on October 21, 1988 you need compress to unarchive any of it parts 1-8 and patches 1-5 are available Now, suppose that I had a list of local BBS's that I was willing to make available. It would have an entry like: ;;twwells;*;bbslist;2001 Jan 1;;bbs systems in south Florida This says that the file bbslist is available but that it has no entry in the information database. --- That leaves the problem of how to distribute this database. Here are my goals: 1) To minimize the amount of information retransmitted through the newsgroup. In an ideal world, the data would get transmitted once, and everyone would thereafter query archive sites for current copies. 2) To minimize the delay in getting the information out. This means avoiding batching the data; it would not be very nice to hold some archive information just because no one else was posting at that time. 3) To minimize the pain of maintaining a database from the information which flows through comp.archives. The first one is the stickiest problem. If I never retransmitted any data, sites which want to start a database would have to find someone who was willing to let them have a copy of the database. Where would they find this information? This means that I need to, at least, periodically post a minimal database of sites that are archives for the database. Now, how do I best serve the needs of the guy who just has one thing he is looking for? If I send the data just once, he is unlikely to see it. The alternative is to send it periodically, with reasonably long expiration dates, so that he can look on his system. Anyway, for now, I will do the latter; if the volume gets too high, then I'll look into some other method. The second item means posting the information as soon as it comes in and has been verified. The main drawback to this is that sometimes the information is incorrectly sent. Putting a delay in the system results in much of this error being corrected before it gets out. My own feeling is to make updating the system reasonably painless, so that if errors like this occur, they can be fixed reasonably easily. The third item requires minimizing the information transmitted which is used to update the database (a worthy goal of its own) and minimizing the programming needed to maintain the database. The first suggests sending updates as increments: if a site adds or deletes something, only that addition or deletion gets sent, not the whole thing. In the interests of keeping the database simple, the whole database should be maintained in ASCII and be maintainable with standard UNIX tools. Of course, it would be even better if the tools needed to maintain this could be found through the database. ---- That leads to the problem of how to maintain the database. First, the subject line is used to indicate that this is a database update message. Such subject line starts with the string 'DB:'. This should make it reasonable to separate these entries from the others. The remainder of the subject line may be used for any additional comments I might wish to add. The body of the message contains the database update commands. Commands to add data look like: @ADD <database> and the following data is what is to be added. <database> is one of the strings INFO, SITE, or INDEX. The new data is terminated by a blank line. Commands to delete data look like: @DEL <database> <key> The key depends on what is being deleted. Deletions from the information database just use the item name. Deletions from the site database use the site name. Deletions from the archive index use the site name, the access method, and the access handle for the line to be deleted. There is a special command to delete all index entries for a site; its form is: @DELALL INDEX <site> All of this should be reasonably easy to do; I roughed out a shell script using sed, join, and comm that would handle this; though it would be SLOW. However, it would be reasonable easy to write a simple program that would be MUCH faster. --- Ok, guys, its your turn. --- Bill {uunet|novavax}!proxftl!twwells!bill
grumpy@edg1 (Eric Schwarz) (10/27/88)
I've got a question and possible problem for you concerning the database format. Is there a reason why you are using different field delimiters for the 3 database entry formats? The site entry uses colons, the information entry uses commas (with colon sub-field delimiters), and the content entry uses semi-colons. The site entry contains a path to the archive files, what about archives that have multiple archive directories? You need to know which files are in which directories. Is putting this information in the content entry going to make it too big (I don't know how many content entries there will be eventually)? Apart from these two items, it looks pretty good. Eric Schwarz uunet!edg1!grumpy
bill@twwells.uucp (T. William Wells) (11/01/88)
In article <275@edg1.UUCP> grumpy@edg1 (Eric Schwarz) writes:
: Is there a reason why you are using different field delimiters for
: the 3 database entry formats? The site entry uses colons, the
: information entry uses commas (with colon sub-field delimiters), and
: the content entry uses semi-colons.
No, other than carelessness. I am changing the format so that
semicolons are the field delimiter, commas are the smallest subfield
delimiter, and colons are the intermediate field delimiter.
: The site entry contains a path to the archive files, what about
: archives that have multiple archive directories? You need to know
: which files are in which directories. Is putting this information
: in the content entry going to make it too big (I don't know how
: many content entries there will be eventually)?
This one reason why there can be more than one CO line. Suppose that
I had stuff in directories /archive/foo and /archive/bar; I could
then have two CO lines:
CO foo.uucp;uucp;/archive/foo;...
CO bar.uucp;uucp;/archive/bar;...
In the content database, things in /archive/foo would have lines like:
prog;vers;mysite;foo*;foo-file;...
and things in /archive/bar would have lines like:
prog;vers;mysite;bar*;bar-file;...
This also makes it easy to tell everyone that the path has changed:
all you do is resubmit the site entry.
---
Bill
{uunet|novavax}!proxftl!twwells!bill
comparc@twwells.uucp (comp.archives) (11/11/88)
This is the second attempt at the database structure. Changes are still possible, so send in any comments you might have. Here is a short summary of the changes from the previous version: Lines in the database beginning with # are ignored. The end of data in a DB: posting is signaled by a line containing @END. Everything in a DB: posting before the first line beginning with an @ is ignored. The time field in the CO line for ftp access has been changed. A TT line has been added to the site entry format; it contains a short title for the archive site. A TM line has been added to the site entry format; it specifies the best times to use the archives. A KW line has been added to the site entry format; it contains a list of keywords describing the archive. (The original description said that the keywords are separated by a semicolon, this is an error: they are separated by commas.) An IX line has been added to the site entry format; it contains information about the index files for the archive. The contents lines have a new field, containing the size of the file in K. Some field delimiters have been changed. The CO line now uses semicolons instead of colons. The SY line now uses semicolons instead of commas. --- Comments in the databases begin with a #. They are retained with the data but are otherwise ignored. In the line oriented databases, if there is a line that is to be left blank, that line should still be entered, but with everything but the keyword left blank. --- The site database contains a series of entries separated by blank lines. Each entry has the following lines: NM <the site name> EN <who added the entry and when> TM <best times to call the site> TT <the name of the archive> AD <who administers the site> MA <the administrator's mailing address> CO <information needed to set up communications with the site> IX <where the index files> KW <keywords describing the archive> DE <description of the site> Lines from TT to DE may be repeated as a group as often as necessary to describe different archives at a single site. Each of the lines from AD to DE may be repeated as often as necessary to contain the data. Following is a detailed description of each line. NM <the site name> This is a domain name. If you are a uucp site, you should write this as <site>.uucp. EN <user>@<site> (<name>) <date> This says who the person is who entered the database entry. The <date> is the output from the date command. TM <time zone>;[[<day>],...<from>-<to> <load>];... This lets people know when the best times are to use the archive. The first field is the time zone the archive is contained in; all times in the entry are presumed to be relative to that time zone. <Day> is a three letter day abbreviation. The <from> and <to> are times in 24 hr notation. <Load> is a single word describing the load on your system at these times, the suggested words are: none, light, moderate, heavy, swamped. TT <the name of the archive> A short title for the archive. AD <user>@<site> (<name>) The person who administers the archive. If more than one person administers the archive, there should be more than one of these. MA <the administrator's mailing address> The mailing address for help or information. Leave this blank unless you want snail-mail. People who mail to this address had better include a SASE or e-mail address or forget about getting any response. CO <access-tag>;ftp;<name>;<internet address>;<directory>;<when available> CO <access-tag>;uucp;<directory>;<L.sys entry> This line describes each method of getting at the archive. If there is more than one way to get at the archive, or more than one directory containing archive information, then there will be more than one of these lines. The <access-tag> is used when not all of your archived information is available through all paths to your site. For example, you might have a mail based server for small items but require a direct link for larger things. Each item that you list as available through your archive has a tag that is used to indicate which way it can be accessed. There may be more than one line for a single tag. This would mean that there is more than one way to get to the same set of information. The next field describes the access method. Right now, it is either uucp or ftp; more will be added as needed. The remaining fields depend on the access method. There are two fields for uucp access. The first is the path name which archive file names are relative to. The second is an L.sys entry that would be used to access your site. For ftp, the fields are the domain name for accessing the archive (which is normally the same as the site name), the internet address for the above, the directory where the archive information resides, and the times when the archive is available. If the archive is always available, leave that field blank. Otherwise, format as [[<day>,...<from>-<to>];... IX <access-tag>;<handle>;<size>;<date>;<tools needed to unarchive>;<comments> This line describes the index file(s) for the archive. It is the same format as the entries in the index database, except that the first three fields are not present. KW <keyword>,... This is a list of keywords that describe what the site carries. DE <description of the site> This is a few lines that describe the site. This should be kept reasonable short, but should give any information not specified in the previous lines that might be useful to the archive user. --- The archived information database contains a series of entries separated by blank lines. Each entry has the following lines: NM <name of the item> VR <a version number> AU <the author of the item> MA <the maintainer of the item> EN <who entered this into the database> TT <a title for the item> KW <keywords for the item> SY <hardware and software needed for it, and how hard it is to bring it up> DE <a short description of the item> Following is a detailed description of each line. NM <name> The name of the item. If the item is a program that ports to one environment, the name is that environment hyphenated with the program name; otherwise it is just the name. Note that this is not intended to be useful by itself, e.g., unix-pcomm might eventually also refer to something that has been made to work under VMS. Should there be two items with the same name, the later item will have its author's name appended. For example, should John Turkey later write a pcomm for UNIX, it would be called unix-pcomm-turkey. VR version <version> VR date <date> These tell which version this entry refers to. The first form is used for things with named versions, the second is used for something which is regularly updated. The date, for the second format, is yymmdd, and specifies the date the thing was last updated. Some things are so continuously updated that they should not have a version; for them, leave this line blank. AU <user>@<site> (<name>) This is the person or persons who wrote the thing. If there is more than one author, use more than one line. MA <user>@<site> (<name>) This is who is maintaining the item. If the item is not being maintained, leave this blank. If several people are maintaining it, use several lines. Note that anyone whose name is on one of these lines can expect e-mail about the item. EN <user>@<site> (<name>) <date> This is the person responsible for the entry and the date on which the entry was added or updated. TT <title> A title for the item. KW <keyword>,... Keywords describing the item. Some good kinds of keywords: `all-source', which means that all the source (other than that of the tools mentioned below) needed is included; `public-domain', which indicates that the item is in the public domain. SY <hardware>[:<hw add-ons},...];<software>[:<sw add-ons},...]; <effort-needed>;<tools-needed> For each system this item runs on (or must be used on), there should be one of these lines. The fields are: 1) The hardware it runs on. If it runs on any hardware which a particular OS runs on, the entry is `any'. If the item needs hardware other than the standard for the system, add words for it after a colon. 2) The OS it runs under. There are several generic names like `unix' or `sysv-unix'. Optional OS things which are needed are indicated the same way hardware options are. Also, software which is not listed in this database which is needed to make this item go is listed here. For example, were this item to be a Dbase program, this field would be: MS-DOS:Dbase-II. 3) How much effort is needed to make it go. If following the directions is sufficient, the entry is `install'. 4) This entry contains any tools, not normally available on your system, which one must have in order to build or use this item. All items which are in this section must also have their own entries in the information directory. DE <some text> This is a short descrpiton of the item. This should be kept brief; putting the man page here is not appropriate. Here is an entry, suitable for the databases created through comp.archives. NM free-distribution-database VR AU bill@twwells.UUCP (T. William Wells) MA bill@twwells.UUCP (T. William Wells) EN bill@twwells.UUCP (T. William Wells) Fri Nov 11 00:56:16 EST 1988 TT Database of freely distributable, electronically accessible information. KW database,public-domain SY any;any;; DE This database is constructed from the information that passes DE through comp.archives. It contains information on any software, DE databases, documents, or what-have-you, that is both freely DE distributable and available electronically. "Freely DE distributable" means that, if you have a copy of the item, you DE can (at least) make exact copies and give them away, and you DE don't have to tell the owner of the item (if any) that you have DE done so. "Electronically available" means that it is either DE accessible through a publicly accessible network, or is available DE by a means that does not involve paying a fee to the DE distributor. This information is provided as a free service and DE there is *no one* guaranteeing that any of it is accurate or DE useful. Use it your own risk. --- The site index ties the previous two databases together. This is the format: <name>;<version>;<archive>;<access-tag>;<handle>;<size>; <date>;<tools>;<comments> The first two fields link this entry to an entry in the info database; they correspond to the NM and VR fields. If this file is not listed in the database, these fields are blank. `Site-name' is the name of the site, as recorded in the site database. `Access-type' is one of the access tags specified in the site entry. Note that this is in the style of UNIX file names: wild cards are permitted. `handle' is used with the information from the site entry to construct the request from the archive. For example, using uucp, if the site entry contained /usr/archives as the path to which files names are relative, and this field contains foobar.shar, then the path name you should use to get this item is /usr/archives/foobar.shar. `Date' is the date which this entry was added to the database. This should be yymmdd. `Tools' is a list of programs needed to unarchive the file; each must be a name in the info database. Standard system utilities are not listed. `Comments' is anything useful to add. --- The DB: postings contain information to update the database. The update information starts with the first line beginning with an @ and ends with a line containing @END. Additional information, not intended to be part of the database can be added before the first @ line or after the @END line. Commands to add data look like: @ADD <database> and the following data is what is to be added. <database> is one of the strings INFO, SITE, or INDEX. The new data is terminated by a blank line. This blank line is required, no matter what the next command is. Commands to delete data look like: @DEL <database> <key> The key depends on what is being deleted. Deletions from the information database just use the item name. Deletions from the site database use the site name. Deletions from the archive index use the site name, the access method, and the access handle for the line to be deleted. There is a special command to delete all index entries for a site; its form is: @DELALL INDEX <site> --- Bill {uunet|novavax}!proxftl!twwells!bill send comp.archives postings to twwells!comp-archives send comp.archives related mail to twwells!comp-archives-request
comparc@twwells.uucp (comp.archives) (12/01/88)
This is the third attempt at the database structure. Changes are still possible, so send in any comments you might have. Here is a short summary of the changes from the previous version: 1) The definition of a site is somewhat vague. What I am going to do is to consider one set of archives under the control of a single administrator as an archive site. This means that the site entry won't have different sets of data for archives located at the same site. This also means that the archive name will be somewhat less related to the address of the archive. 2) The access method and access tag of the CO fields have been swapped. The access method now comes first. --- Comments in the databases begin with a #. They are retained with the data but are otherwise ignored. In the line oriented databases, if there is a line that is to be left blank, that line should still be entered, but with everything but the keyword left blank. --- The site database contains a series of entries separated by blank lines. Each entry has the following lines: NM <the site name> EN <who added the entry and when> TM <best times to call the site> TT <the name of the archive> AD <who administers the site> MA <the administrator's mailing address> CO <information needed to set up communications with the site> IX <where the index files> KW <keywords describing the archive> DE <description of the site> Each of the lines from AD to DE may be repeated as often as necessary to contain the data. Following is a detailed description of each line. NM <the site name> This name should be related to the address used to find the site, though it doesn't have to. This should be kept fairly short. EN <user>@<site> (<name>) <date> This says who the person is who entered the database entry. The <date> is the output from the date command. TM <time zone>;[[<day>],...<from>-<to> <load>];... This lets people know when the best times are to use the archive. The first field is the time zone the archive is contained in; all times in the site entry are presumed to be relative to that time zone. <Day> is a three letter day abbreviation. The <from> and <to> are times in 24 hr notation. <Load> is a single word describing the load on your system at these times, the suggested words are: none, light, moderate, heavy, swamped, best, worst. TT <the name of the archive> A short title for the archive. AD <user>@<site> (<name>) The person who administers the archive. If more than one person administers the archive, there should be more than one of these. MA <the administrator's mailing address> The mailing address for help or information. Leave this blank unless you want snail-mail. People who mail to this address had better include a SASE or e-mail address or forget about getting any response. CO ftp;<access-tag>;<name>;<internet address>;<directory>;<when available> CO uucp;<access-tag>;<directory>;<L.sys entry> This line describes each method of getting at the archive. If there is more than one way to get at the archive, or more than one directory containing archive information, then there will be more than one of these lines. The <access-tag> is used when not all of your archived information is available through all paths to your site. For example, you might have a mail based server for small items but require a direct link for larger things. Each item that you list as available through your archive has a tag that is used to indicate which way it can be accessed. There may be more than one line for a single tag. This would mean that there is more than one way to get to the same set of information. The next field describes the access method. Right now, it is either uucp or ftp; more will be added as needed. The remaining fields depend on the access method. There are two fields for uucp access. The first is the path name which archive file names are relative to. The second is an L.sys entry that would be used to access your site. For ftp, the fields are the domain name for accessing the archive, the internet address for the above, the directory where the archive information resides, and the times when the archive is available. If the archive is always available, leave that field blank. Otherwise, format as [[<day>,...<from>-<to>];... IX <access-tag>;<handle>;<size>;<date>;<tools needed to unarchive>;<comments> This line describes the index file(s) for the archive. It is the same format as the entries in the index database, except that the first three fields are not present. You should also list README files and the like. KW <keyword>,... This is a list of keywords that describe what the site carries. DE <description of the site> This is a few lines that describe the site. This should be kept reasonable short, but should give any information not specified in the previous lines that might be useful to the archive user. --- The archived information database contains a series of entries separated by blank lines. Each entry has the following lines: NM <name of the item> VR <a version number> AU <the author of the item> MA <the maintainer of the item> EN <who entered this into the database> TT <a title for the item> KW <keywords for the item> SY <hardware and software needed for it, and how hard it is to bring it up> DE <a short description of the item> Following is a detailed description of each line. NM <name> The name of the item. If the item is a program that runs in one environment, the name is that environment hyphenated with the program name; otherwise it is just the name. Note that this is not intended to be useful by itself, e.g., unix-pcomm might eventually also refer to something that has been made to work under VMS. Should there be two items with the same name, the later item will have its author's name appended. For example, should John Turkey later write a pcomm for UNIX, it would be called unix-pcomm-turkey. VR version <version> VR date <date> These tell which version this entry refers to. The first form is used for things with named versions, the second is used for something which is regularly updated. The date, for the second format, is yymmdd, and specifies the date the thing was last updated. Some things are so continuously updated that they should not have a version; for them, leave this line blank. AU <user>@<site> (<name>) This is the person or persons who wrote the thing. If there is more than one author, use more than one line. MA <user>@<site> (<name>) This is who is maintaining the item. If the item is not being maintained, leave this blank. If several people are maintaining it, use several lines. Note that anyone whose name is on one of these lines can expect e-mail about the item. EN <user>@<site> (<name>) <date> This is the person responsible for the entry and the date on which the entry was added or updated. TT <title> A title for the item. KW <keyword>,... Keywords describing the item. Some good kinds of keywords: `all-source', which means that all the source (other than that of the tools mentioned below) needed is included; `public-domain', which indicates that the item is in the public domain. SY <hardware>[:<hw add-ons},...];<software>[:<sw add-ons},...]; <effort-needed>;<tools-needed> For each system this item runs on (or must be used on), there should be one of these lines. The fields are: 1) The hardware it runs on. If it runs on any hardware which a particular OS runs on, the entry is `any'. If the item needs hardware other than the standard for the system, add words for it after a colon. 2) The OS it runs under. There are several generic names like `unix' or `sysv-unix'. Optional OS things which are needed are indicated the same way hardware options are. Also, software which is not listed in this database which is needed to make this item go is listed here. For example, were this item to be a Dbase program, this field would be: MS-DOS:Dbase-II. 3) How much effort is needed to make it go. If following the directions is sufficient, the entry is `install'. 4) This entry contains any tools, not normally available on your system, which one must have in order to build or use this item. All items which are in this section must also have their own entries in the information directory. DE <some text> This is a short descrpiton of the item. This should be kept brief; putting the man page here is not appropriate. Here is an entry, suitable for the databases created through comp.archives. NM free-distribution-database VR AU bill@twwells.UUCP (T. William Wells) MA bill@twwells.UUCP (T. William Wells) EN bill@twwells.UUCP (T. William Wells) Fri Nov 11 00:56:16 EST 1988 TT Database of freely distributable, electronically accessible information. KW database,public-domain SY any;any;; DE This database is constructed from the information that passes DE through comp.archives. It contains information on any software, DE databases, documents, or what-have-you, that is both freely DE distributable and available electronically. "Freely DE distributable" means that, if you have a copy of the item, you DE can (at least) make exact copies and give them away, and you DE don't have to tell the owner of the item (if any) that you have DE done so. "Electronically available" means that it is either DE accessible through a publicly accessible network, or is available DE by a means that does not involve paying a fee to the DE distributor. This information is provided as a free service and DE there is *no one* guaranteeing that any of it is accurate or DE useful. Use it your own risk. --- The site index ties the previous two databases together. This is the format: <name>;<version>;<archive>;<access-tag>;<handle>;<size>; <date>;<tools>;<comments> The first two fields link this entry to an entry in the info database; they correspond to the NM and VR fields. If this file is not listed in the database, these fields are blank. `Site-name' is the name of the site, as recorded in the site database. `Access-type' is one of the access tags specified in the site entry. Note that this is in the style of UNIX file names: wild cards are permitted. `handle' is used with the information from the site entry to construct the request from the archive. For example, using uucp, if the site entry contained /usr/archives as the path to which files names are relative, and this field contains foobar.shar, then the path name you should use to get this item is /usr/archives/foobar.shar. `Date' is the date which this entry was added to the database. This should be yymmdd. `Tools' is a list of programs needed to unarchive the file; each must be a name in the info database. Standard system utilities are not listed. `Comments' is anything useful to add. --- The DB: postings contain information to update the database. The update information starts with the first line beginning with an @ and ends with a line containing @END. Additional information, not intended to be part of the database can be added before the first @ line or after the @END line. Commands to add data look like: @ADD <database> and the following data is what is to be added. <database> is one of the strings INFO, SITE, or INDEX. The new data is terminated by a blank line. This blank line is required, no matter what the next command is. Commands to delete data look like: @DEL <database> <key> The key depends on what is being deleted. Deletions from the information database just use the item name. Deletions from the site database use the site name. Deletions from the archive index use the site name, the access method, and the access handle for the line to be deleted. There is a special command to delete all index entries for a site; its form is: @DELALL INDEX <site> --- Bill {uunet|novavax}!proxftl!twwells!bill send comp.archives postings to twwells!comp-archives send comp.archives related mail to twwells!comp-archives-request
comparc@twwells.uucp (comp.archives) (01/03/89)
Here is a short summary of the changes from the previous version: 1) Two new access methods have been added, one for fidonet and for BBS's. 2) All file sizes should be in K; this was not stated in the previous version. 3) Text on the DE lines should be kept to less than 70 characters; this makes life easier for pretty-printing the archive information. 4) Lines that have fields separated by semicolons should have all the semicolons on the line, including trailing ones. This was not specified in the previous version. 5) The key separator on @DEL lines is a semicolon. This was not specified in the previous version. --- Comments in the databases begin with a #. They are retained with the data but are otherwise ignored. In the line oriented databases, if there is a line that is to be left blank, that line should still be entered, but with everything but the keyword left blank. Lines that have fields separated by semicolons should have all semicolons on the line, including trailing ones. --- The site database contains a series of entries separated by blank lines. Each entry has the following lines: NM <the site name> EN <who added the entry and when> TM <best times to call the site> TT <the name of the archive> AD <who administers the site> MA <the administrator's mailing address> CO <information needed to set up communications with the site> IX <where the index files> KW <keywords describing the archive> DE <description of the site> Each of the lines from AD to DE may be repeated as often as necessary to contain the data. Following is a detailed description of each line. NM <the site name> This name should be related to the address used to find the site, though it doesn't have to. This should be kept fairly short. EN <user>@<site> (<name>) <date> This says who the person is who entered the database entry. The <date> is the output from the date command. TM <time zone>;[[<day>],...<from>-<to> <load>];... This lets people know when the best times are to use the archive. The first field is the time zone the archive is contained in; all times in the site entry are presumed to be relative to that time zone. <Day> is a three letter day abbreviation. The <from> and <to> are times in 24 hour notation. <Load> is a single word describing the load on your system at these times, the suggested words are: none, light, moderate, heavy, swamped, best, worst. TT <the name of the archive> A short title for the archive. AD <user>@<site> (<name>) The person who administers the archive. If more than one person administers the archive, there should be more than one of these. MA <the administrator's mailing address> The mailing address for help or information. Leave this blank unless you want snail-mail. People who mail to this address had better include a SASE or e-mail address or forget about getting any response. CO ftp;<access tag>;<name>;<internet address>;<directory>;<when available> CO uucp;<access tag>;<directory>;<L.sys entry> CO fido;<access tag>;<access-info> CO bbs;<access tag>;<phone>;<when available>;<modem settings>; <protocols supported>;<comments> This line describes each method of getting at the archive. If there is more than one way to get at the archive, or more than one directory containing archive information, then there will be more than one of these lines. The <access tag> is used when not all of your archived information is available through all paths to your site. Suppose that you had two archives, one of small programs that you had a mail-based server for, and another of larger stuff that you want to transfer only through uucp. Your CO line for mail access could have an access tag of `mail' and your CO line for uucp access could have an access tag of `uucp'. Files which are available only through mail would have an access tag of `mail'. Files available only through uucp would have an access tag of `uucp'. Files that were available either way would have an access tag of `*'. There may be more than one line for a single tag. This would mean that there is more than one way to get to the same set of information. The next field describes the access method. Right now, it is one of uucp, ftp, fido, or bbs; more will be added as needed. The remaining fields depend on the access method. There are two fields for uucp access. The first is the path name which archive file names are relative to. The second is an L.sys entry that would be used to access your site. For ftp, the fields are the domain name for accessing the archive, the internet address for the above, the directory where the archive information resides, and the times when the archive is available. If the archive is always available, leave that field blank. Otherwise, format as [[<day>,...<from>-<to>];... There is one field for fidonet. This is some information needed for accessing the archive, as yet I have no idea what this info is. There are five fields for BBS access. The first is the phone number; if you want them, use hyphens for digit separators. The second field indicates when the BBS is available; leave it blank if it is always available. The modem settings are a comma separated list of entries like: <data bits><parity><stop bits>:<speed>. Parity is represented by one of the letters: (N)o, (E)ven, (O)dd, (M)ark, (S)pace. The protocols suppoerted field indicates which protocols are available for file transfer. The final field is for additional comments about getting into the BBS. IX <access tag>;<handle>;<size>;<date>;<tools needed to unarchive>;<comments> This line describes the index file(s) for the archive. It is the same format as the entries in the index database, except that the first three fields are not present. You should also list README files and the like. Note that the file size should be in K's. KW <keyword>,... This is a list of keywords that describe what the site carries. DE <description of the site> This is a few lines that describe the site. This should be kept reasonably short, but should give any information not specified in the previous lines that might be useful to the archive user. The text on these lines should be kept to less than 70 characters. --- The archived information database contains a series of entries separated by blank lines. Each entry has the following lines: NM <name of the item> VR <a version number> AU <the author of the item> MA <the maintainer of the item> EN <who entered this into the database> TT <a title for the item> KW <keywords for the item> SY <hardware and software needed for it, and how hard it is to bring it up> DE <a short description of the item> Following is a detailed description of each line. NM <name> The name of the item. If the item is a program that runs in one environment, the name is that environment hyphenated with the program name; otherwise it is just the name. Note that this is not intended to be useful by itself, e.g., unix-pcomm might eventually also refer to something that has been made to work under VMS. Should there be two items with the same name, the later item will have its author's name appended. For example, should John Turkey later write a pcomm for UNIX, it would be called unix-pcomm-turkey. VR version <version> VR date <date> These tell which version this entry refers to. The first form is used for things with named versions, the second is used for something which is regularly updated. The date, for the second format, is yymmdd, and specifies the date the thing was last updated. Some things are so continuously updated that they should not have a version; for them, leave this line blank. AU <user>@<site> (<name>) This is the person or persons who wrote the thing. If there is more than one author, use more than one line. MA <user>@<site> (<name>) This is who is maintaining the item. If the item is not being maintained, leave this blank. If several people are maintaining it, use several lines. Note that anyone whose name is on one of these lines can expect e-mail about the item. EN <user>@<site> (<name>) <date> This is the person responsible for the entry and the date on which the entry was added or updated. TT <title> A title for the item. KW <keyword>,... Keywords describing the item. Some good kinds of keywords: `all-source', which means that all the source (other than that of the tools mentioned below) needed is included; `public-domain', which indicates that the item is in the public domain. SY <hardware>[:<hw add-ons},...];<software>[:<sw add-ons},...]; <effort-needed>;<tools-needed> For each system this item runs on (or must be used on), there should be one of these lines. The fields are: 1) The hardware it runs on. If it runs on any hardware which a particular OS runs on, the entry is `any'. If the item needs hardware other than the standard for the system, add words for it after a colon. 2) The OS it runs under. There are several generic names like `unix' or `sysv-unix'. Optional OS things which are needed are indicated the same way hardware options are. Also, software which is not listed in this database which is needed to make this item go is listed here. For example, were this item to be a Dbase program, this field would be: MS-DOS:Dbase-II. 3) How much effort is needed to make it go. If following the directions is sufficient, the entry is `install'. 4) This entry contains any tools, not normally available on your system, which one must have in order to build or use this item. All items which are in this section must also have their own entries in the information directory. DE <some text> This is a short descrpiton of the item. This should be kept brief; putting the man page here is not appropriate. The text on these lines should be kept to less than 70 characters. Here is an entry, suitable for the databases created through comp.archives. NM free-distribution-database VR AU bill@twwells.UUCP (T. William Wells) MA bill@twwells.UUCP (T. William Wells) EN bill@twwells.UUCP (T. William Wells) Fri Nov 11 00:56:16 EST 1988 TT Database of freely distributable, electronically accessible information. KW database,public-domain SY any;any;; DE This database is constructed from the information that passes DE through comp.archives. It contains information on any software, DE databases, documents, or what-have-you, that is both freely DE distributable and available electronically. "Freely DE distributable" means that, if you have a copy of the item, you DE can (at least) make exact copies and give them away, and you DE don't have to tell the owner of the item (if any) that you have DE done so. "Electronically available" means that it is either DE accessible through a publicly accessible network, or is available DE by a means that does not involve paying a fee to the DE distributor. This information is provided as a free service and DE there is *no one* guaranteeing that any of it is accurate or DE useful. Use it your own risk. --- The site index ties the previous two databases together. This is the format: <name>;<version>;<archive>;<access tag>;<handle>;<size>; <date>;<tools>;<comments> The first two fields link this entry to an entry in the info database; they correspond to the NM and VR fields. If this file is not listed in the database, these fields are blank. `Site-name' is the name of the site, as recorded in the site database. `Access-type' is one of the access tags specified in the site entry. Note that this is in the style of UNIX file names: wild cards are permitted. `handle' is used with the information from the site entry to construct the request from the archive. For example, using uucp, if the site entry contained /usr/archives as the path to which files names are relative, and this field contains foobar.shar, then the path name you should use to get this item is /usr/archives/foobar.shar. `Size' is the size of the file, in K. `Date' is the date which this entry was added to the database. This should be yymmdd. `Tools' is a list of programs needed to unarchive the file; each must be a name in the info database. Standard system utilities are not listed. `Comments' is anything useful to add. --- The DB: postings contain information to update the database. The update information starts with the first line beginning with an @ and ends with a line containing @END. Additional information, not intended to be part of the database, can be added before the first @ line or after the @END line. Commands to add data look like: @ADD <database> and the following data is what is to be added. <database> is one of the strings INFO, SITE, or INDEX. The new data is terminated by a blank line. This blank line is required, no matter what the next command is. Commands to delete data look like: @DEL <database> <key> The key depends on what is being deleted. Deletions from the information database just use the item name. Deletions from the site database use the site name. Deletions from the archive index use the site name, the access method, and the access handle for the line to be deleted. Semicolons are used to separate the key fields. There is a special command to delete all index entries for a site; its form is: @DELALL INDEX <site> --- Bill {uunet|novavax}!proxftl!twwells!bill send comp.archives postings to twwells!comp-archives send comp.archives related mail to twwells!comp-archives-request