[comp.archives] From the moderator

bill@twwells.uucp (T. William Wells) (11/01/88)

Hello all,

First prize, for speedy response, goes to Mike Wexler
<wyse.com!mikew>, the moderator of comp.sources.x. However, the data
he sent me is in his own format (though converting it will be
mechanical), so you won't get that first.

He also sent it to me at twwells!bill. I guess that that was my fault,
for not specifying, so let me suggest the following: if you have
something relating to comp.archives and you want me to consider
posting it, send it to comp-archives. If you don't want it posted,
send it to comp-archives-request, and I'll attend to it as it seems
necessary. Please do not send comp.archives stuff to me directly, as
that merely means that I have to move it around, taking more of the
time I don't have.

The second prize goes to Paul Vixie <decwrl!vixie>. He not only sent
an early response, but he even sent an entry which I can use directly.
That one will go out first, as I don't have to do any work. (hint,
hint :-)

	[And yes, it would be odd if an Objectivist upstaged a
	business-type with a *free* alternative. At least, so it
	would seem to a non-Objectivist. But what do they know? :-)]

However, there is a question as to what form the CO (communication)
line should take for ftp access. Here is the line he sent:

CO ftp:gatekeeper.dec.com:128.45.9.52:/pub:no time restrictions

Firstly, this line should be:

CO ftp;ftp;gatekeeper.dec.com;128.45.9.52;/pub;0000-2359

What's the difference? Notice the added ftp: at the start of the line.
This is used to identify lines in the index which can be reached via
some particular communications method.  This is irrelevant for most
sites, but in various circumstances, it can be important.

Also, the last field has been changed to a time range, rather than
text; this for uniformity.

Finally, the field delimiters have been changed to semicolons

Since I have no experience with ftp, I am going to have to take
other's words for what is needed to do ftp. The fields in his
suggested format are:

CO ftp;ftp;gatekeeper.dec.com;128.45.9.52;/pub;0000-2359
	   |                  |           |     \-accessible times
	   \-domainized       |           \-relative path of archives for
	     host name        \-IP address  anonymous ftp login

Any comments?

He also asks if I know anything about bib/refer. Well, I don't.  He
suggests that I might want to make my database consistent with that
format. Any comments? (Paul: you might as well send me the man page;
I'll need some information to make an informed decision.)

---

Rich $alz sent me e-mail; since I don't see anything in it that I
imagine he would not want published, I'll quote from it. I hope that
no one is offended by this.

: >Since there is no discussion group related to this one, I will accept
: >other kinds of postings as well. I particularly encourage people who
: >are having problems getting to archive sites to post; after all, what
: >good is this newsgroup if you can't use the info coming from it?  I
: >also encourage those who have answers to respond.
:
: Consider first contacting the archive maintainer.  Archives are generally
: a free public service (not so for UUCP-UUNET, where it's what people
: pay for) and public lambastings will make folks very gun-shy about being
: nice any more...

I was thinking about question on the order of "I tried using kermit
to get into site foobar. But when I got the file back, all I got was
garbage!" And I was hoping that I'd get responses of the order "Did
you set binary mode?"

Since I have no desire to offend the very people who make this
newsgroup possible, I am certainly not going to permit flaming to be
directed their way. :-)

Better yet, let's have no flames at all!

: Your fancy format seems reasonable enough, but I'd add a couple of
: conventions
:         RE      random remarks

The DE fields in the information and site databases serve that
purpose.

I do not want to add any additional lines in the content database as
I expect that to be by far the largest database and I want to avoid
doing more work extracting information from it.

:         ##      ignored line -- e.g., you can explain the database
:         ##      format here, put in sequence numbers, etc.

Well, I am intending to send update messages to the database, rather
than the whole database, so such lines are irrelevant.  However, in
the update messages, above the first control line, I can put whatever
I want, as that should be ignored.

: Oh yeah, one more thing.  You might want to make a provision for
: places where archive "responsibility" is split up.  E.g., on UUNET
: I do comp.sources.unix and (maybe) c.s.misc archive-maintenance, but
: David Comay, dsc@uunet.uu.net, handles the rest.

Good point. Probably the simplest way is to permit several MA/CO...
sequences in the site entry. Unless someone provides a good reason
not to, that is how I'll do it.

: Not to be harsh, but the combined listing is just total blue sky... :-)
: (Yeah, yeah, and man will never fly, neither)

Don't I know it! I believe that one person trying to do this is
impossible; however, what I am relying on is bootstrapping. If I do
this long enough, and effectively enough, I can hope for cooperation
from the various people who can contribute in one way or another.

After a while, and given that this becomes something of an
institution, it might become the norm that when someone creates or
archives a FD program, they send in the appropriate information as a
matter of course. Then this thing might fly!

: >. In
: >particular I would like to hear from those who administer archive
: >sites, those who moderate newsgroups that contain information which is
: >archived.
:
: One good way is to run through the newsgroups and contact all the comp.sources
: and comp.binaries moderators with requests for info.

I have well over 300K of information I have collected from the net
and which I will get around to processing. Real Soon Now.

---

Ashwin Ram <proxftl!uunet!ames!harvard!yale!ram-ashwin> asks:

: I like the idea.  However, is it really necessary to make the format so
: cryptic?  After all, much of the time humans will be browing through the
: listings, so it makes sense to make them a little more descriptive.  E.g., AU
: and MA aren't particularly intuitive for someone who isn't already familiar
: with the format.

The intention is to make the database reasonably compact and easy to
process. Programs at the receiving end could be used to convert it
into something easier to read. Under UNIX, a simple sed script could
convert all those cryptic abbreviations into something a bit more
understandable.

:                   I suspect you'll find yourself having to post format
: descriptions every so often, which indicates that the format isn't
: self-explanatory.

I expect that no matter what I do I'll have to post them, if for no
other reason than to give instructions on how to fill one out, that
being a harder task than just reading one.

---

David Fickes <harvard!bu-it.bu.edu!bu-albert.BU.EDU!berlin> writes:

: Its not completely clear how someone who is looking for a particular
: item... will find it and then know which site it is at... Preferably,
: I'd rather be able to send a note.. to query a database that would then
: answer my not with a single description of the available packages and
: would also give me a list of sites to go searching for them....

There are two separate questions here:

   1) How do I use the database?

      If you don't know the name of the item, you look in the
      information database to find it. Presumably the data there is
      sufficient to let you find the right thing.

      Once you have the name, you looking through the content database
      to see which sites have it.

      You then go looking through the site database to see which one
      you want to use.

	This does raise the point that these databases ought to be
	set up so that one can use an editor on them.  This means
	keeping individual file sizes small. I'll give that some
	thought.

    2) Where is the database?

       I am not proposing that comp.archives have all the data
       available and held in your spool area. This is likely to eat
       much disk space. Instead, I am hoping that, once the size of
       the database starts to get unreasonable, some people will
       decide to make their archives available. WHen that happens,
       I'll regularly post information about those archive sites.

       I'm also hoping that someone will see it in their interest to
       write some software for this.  Time will tell.

: ps: do you happen to be able to mail me a copy of pcomm?  I'd be
: VERY interested...

As it happens, I can't. Right now, my system is connected to the
outside world by my employer's machine, and they'd look askance at my
giving out their UUCP login information. Maybe after I get my modem
working, assuming that you can't find it elsewhere.

---

Brad allen <ulmo@ssyx.ucsc.edu | ulmo@splat.aptos.ca.us> writes:

: I would suggest to get comp.archives up and running that you scavange
: a bit for a few supporters ...
: here are a few ideas:
:         - writer of XBBS in xenix newsgroup
:         - misc. postings you see here and there

Like I said, I have over 300K of stuff to sift through.

What I really need to do is to get the database format solid and then
convince people to use it.

: What I'm getting at is perhaps you need to do a little work yourself,
: mailing these people asking for archive listings to be posted.

Oh, yes. Definitely. But I am also going to have to be careful to
avoid being swamped. The quantity of information out there is just
staggering.

---
Bill
{uunet|novavax}!proxftl!twwells!bill

rick@seismo.CSS.GOV (Rick Adams) (11/04/88)

> : Consider first contacting the archive maintainer.  Archives are generally
> : a free public service (not so for UUCP-UUNET, where it's what people
> : pay for) and public lambastings will make folks very gun-shy about being
> : nice any more...

Note that the uunet archive ARE free.  We charge for access to the machine,
not the archives. If you think its a petty distinction, the Internet folks
can (and do) ftp all they want at no charge. It is analogous to paying
a distribution fee for a tape that has "free" software on it.

> : I do comp.sources.unix and (maybe) c.s.misc archive-maintenance, but
> : David Comay, dsc@uunet.uu.net, handles the rest.

This will come as a great surprise to David...

--rick