[comp.protocols.tcp-ip] copyright status and future development of comp.archives

emv@msen.com (Ed Vielmetti) (06/19/91)

in the near future, postings to comp.archives are going to be tagged
with an explicit copyright notice. [*]  this is a step in the direction of
making this service fully self-supporting, with enough resources
readily available to the project so that I can afford to keep it
going.  as i posted in an article to comp.archives a few months ago,
unless my work on this starts to yield some results, i'm going to stop
distributing my efforts far and wide for free.  free distribution of
comp.archives is currently expected to continue to the end of the
year; if things haven't worked their way out to my satisfaction, i
expect to step down from moderating comp.archives some time not too
long after the winter Usenix meeting.  no specific dates set yet.

there are several very good reasons to stick a copyright notice of
some kind on the materials which i have collected and organized for on
the order of 18 months now.  first and foremost, comp.archives needs
some better publicity and name recognition.  it's somewhat
embarrassing to have people still asking how it's produced, that
there's some sense that it's magic or just automatic processing that's
going on.  explicit copyrights will start to clue people in on just
what it is they are looking at  -- a production not only of some
technology but also of considerable human creative input.  

explicit assertion of copyright will assist me in gaining cooperation
from resource providers who might be interested in producing services
which were derived from my efforts.  these might be on-line searchable
databases, services which offered direct hands-off delivery of
successful searches by anonymous ftp or uucp transfer, caches of
information dynamically updated from comp.archives postings, or paper
or cd-rom products that would incorporate materials derived from
comp.archives.  in addition, it will enable my work to be properly
credited by other researchers who are working on the "resource
discovery" problem; rather than them simply saying "we searched
through netnews for interesting stuff and found a lot of it, so our
search stuff must be pretty good", i would expect proper credit and
attribution and recognition of the substantial progress made thus far.

i expect that in the same timeframe that postings to comp.archives
will also be cross-posted to a new group, with the tentative name
"msen.internet.archives".  MSEN, Inc. will be the publisher of
materials in the msen.* hierarchy; I expect to be doing this for the
benefit of our operations and that of our customers and strategic
partners, and these groups will be fed in accordance with that policy.
I have developed a considerable amount of expertise in this area, and
expect to populate the msen.* hierarchy with interesting, insightful,
and consistently high quality information. 

Some people who really enjoy reading comp.archives right be cut off
from it for some amount of time.  I'm content for that to happen; for
my own needs, I can do all of the filtering and searching and sorting
on netnews and just hoard that knowlege all to myself.  It would be
much easier to do that rather than spend the extra time adding all of
the extra information, verifying that thigs are really there, editing
down really long posting etc.  That's where too much time is spent
right now, and where support from people using that information is
going to help me assess whether it's worthwhile continuing.

If you are currently building any services based on comp.archives
(other than strictly personal use), please contact me and let me know
what your plans are so that we can assure their continued viability on
into 1992.  if you considered building such services and rejected the
notion, let me know what the limitations of the current data stream
are and what you would like to see in the future.

MSEN Inc., if it ever gets sufficiently successful to actually pay any
of its current employees instead of draining their own personal bank
accounts :-(, will be looking for skilled people to fill a position of
Internet Archivist.  (Save your resumes, at the current rate of
progress it's a ways off.)  So far as I can tell, none of the
commercial internet providers as of yet have anyone filling this role;
Cerfnet and ANS has nothing along this line, UUNET's generic title is
"postmaster", and everyone at PSI is working on X.500.  It would make
me quite happy if when we finally got to the point of hiring for this
position, all of the good people had been snapped up; not too likely
as far as I can tell.  I would also be happy to pursue joint
development work with archivists to develop dictionaries or other
classifiction schema and to further the state of the art in searching
and text retrieval systems.

I ran across an estimate that it costs all told $200 to put together a
single complete Library of Congress card catalog entry.  If you look
at the sustained production of about a dozen entries daily in
comp.archives and value it at this at this rate, that's an estimate of
the potential value of this project at about $750,000 to $1M per year.
I believe that MSEN could deliver this service extremely well with
that sort of a budget, that it would be money well spent, and that it
would be best for everyone involved if the end product wasn't burdened
by any nasty copyright nonsense.  Unfortunately, the current "Interim
MSEN" plans don't have a large pile of money falling out of the sky,
and the realities of doing this much work for nothing are starting to
catch up on me.  I hope very much that things will work out well, and
if they don't, well it's been fun.

-- 
Edward Vielmetti, vice president for research, MSEN Inc. emv@msen.com

"With all of the attention and publicity focused on gigabit networks,
not much notice has been given to small and largely unfunded research
efforts which are studying innovative approaches for dealing with
technical issues within the constraints of economic science."  
							RFC 1216

[*] Pointers to materials available under the GNU Public License will
of course be freely redistributable; MSEN will not assert any
copyright or place any restrictions over their redistribution, and
I'll continue to try to track GNU project announcements even if I give
up free distribution of everything else.

schoff@PSI.COM ("Martin Lee Schoffstall") (06/27/91)

Ed,
 
You write....
> So far as I can tell, none of the
> commercial internet providers as of yet have anyone filling this role;
> Cerfnet and ANS has nothing along this line, UUNET's generic title is
> "postmaster", and everyone at PSI is working on X.500.

Actually PSI is working on X.500, Z39.50v2, SNMP, and several research
protocols...

In your initial posting you took a shot at X.500, and in some sense
it is implicit in this message also.  While there are a number of
negative aspects vis a vis X.500 it IS intended and IS being used to
register all kinds of information today.  Your obviously familiar with
the WhitePages work that many people have worked on over the last three
years.  As with any new network protocol there is the chicken and egg
problem, now that we have Mac applications that are available via anonymous
FTP and soon MSDOS applications I'd like to believe that we're going to
break out of our shell!

What you may not be aware of is that X.500 is bound to information retreival
(and of course information registration).  Under DARPA R&D sponsorship
PSI, along with a few other organizations, (Jon Postel can speak to
"FOX") we have been exploring this.  We have a tool which integrates
X.500 and Anonmous FTP so that you can (today) explore the RFC hierarchy
by author, title, etc, and then grab the document from various sites
which hold the RFC's.  The tool is called x5ftp and will be released
no later then the end of the project (31dec91).

And we're extending the model to deal with other things than RFC's....

But again X.500 is not the perfect protocol for information retreival,
neither is Z39.50, and we won't even talk about MARC records!  Recently
due to a discussion in the FOX group we decided to issue the equivalent
of a position paper which is titled "Towards Networked Information
Retrieval", available via anonymous FTP from uu.psi.com in

	wp/nir.ms (troff, ms macros)
	wp/ps/nir.ms (Postscript)

Take a look.

I think your efforts are appreciated by many, I hope they are fruitful,
others are working hard too, and believe that they are on a fruitful
path too.  A decade from now we MAY know who was right.

Marty

PS:  There is some X.500/WhitePages information on-line which can be
	retreived by sending email to wp-info@psi.com, a NULL message
	will suffice, as it does an "auto-reply".

emv@msen.com (Ed Vielmetti) (06/28/91)

In article <9106261948.AA20843@psi.com> schoff@PSI.COM ("Martin Lee Schoffstall") writes:

   We have a tool which integrates
   X.500 and Anonmous FTP so that you can (today) explore the RFC hierarchy
   by author, title, etc, and then grab the document from various sites
   which hold the RFC's.  The tool is called x5ftp and will be released
   no later then the end of the project (31dec91).

   And we're extending the model to deal with other things than RFC's....

Well, I'd have to say that dealing with RFC's is about as easy as they
come, and you'd better have a damn fine project when you're done or
I'll be quite disappointed.  The texts are regular and structured,
there's a lot of boilerplate text which could be extracted out and
conclusions drawn from it, and there's a substantial amount of
"superstructure" in that RFCs reference other documents and there's a
strong sense of "this supersedes that, this modifies that, etc.".
It's a consistent, high quality, verified data stream, you should be
able to do a lot more with it than just browse author and title.

There is no "RFC Hierarchy"; the collection of RFCs is a complex,
tangled web of references, updates, improvements, discussions, and
ephemera.  Attempts to impose a strict hierarchical structure on it
will fail to capture the richness of information in it.  

Does your tool provide any way to search through the various sections
of the RFCs?  For instance, modern RFCs all have a "security
considerations" section; can you browse through those looking for RFCs
which have extensive discussion?  That would be valuable.

Does your tool provide any kind of similarity metrics or groupings
between the RFCs, so that (e.g.) RFC's 1064, 1176, and 1203 are
presented together (IMAP), with RFC 1223 not too far away (POP3) ?  A
tool with proper browsing support would facilitate this kind of exchange.

Several RFCs reference materials which are available for anonymous FTP
from other sites; does your browser have direct support for (e.g.) the
NOCTOOLS catalog?  A good system would let you point and click and get
the goods delivered back to your local machine.

A proper browser or filtering agent would have the ability to store
queries for later replay.  If I find an RFC that I like, can I store
the query that found so that the next time an RFC (or internet draft)
is issued that's similar to it I will be notified?

An RFC tool would be a useful thing, but I don't have high hopes for
x5ftp, to the extent that X.500 is a gubbishy protocol for these kind
of searches and that you're constrained to use that technology.

   ... we decided to issue the equivalent
   of a position paper which is titled "Towards Networked Information
   Retrieval"...

Full citation below.  A reasonably good paper, albeit wordy,
illustrating the defects in both X.500 and Z39.50 for information
retrieval; neither process seems adequate to handle the problem at
hand, though you could argue that any work in this area is progress
and should be supported.  Notably missing from the paper is a mention
of Brewster Kahle's WAIS project (see below), which is an
implementation of Z39.50 that addresses some of the defects mentioned.

Edward Vielmetti, vice president for research, MSEN Inc. emv@msen.com

"(6) The Plan shall identify how agencies and departments can
collaborate to ... expand efforts to improve, document, and evaluate
unclassified public-domain software developed by federally-funded
researchers and other software, including federally-funded educational
and training software; "
			"High-Performance Computing Act of 1991, S. 272"

-- MSEN Archive Service file verification
uu.psi.com
-r--r--r--  1 dsadmin  staff       50666 Jun 25 18:27 /wp/nir.ms
-r--r--r--  1 dsadmin  staff       56973 Jun 27 11:16 /wp/nir.txt
-r--r--r--  1 dsadmin  guest      117611 Jun 25 18:27 /wp/ps/nir.ps
found psi-networked-information-retrieval ok
uu.psi.com:/wp/{nir*,ps/nir*}

-- MSEN Archive Service file verification
quake.think.com
total 5561
drwxrwxrwx  2 14           1024 Jun 25 00:07 wais-discussion
-rw-rw-rw-  1 1637       635857 Jun 21 21:50 WAIStation-Canned-Demo.sit.hqx
-r--r--r--  1 14         463981 Jun 13 20:44 wais-8-b1.tar.Z
-rw-rw-r--  1 1556       475161 May 21 18:43 wais-8-a12-3.tar.Z
-rw-rw-rw-  1 1637       635225 May 16 03:01 WAIStation-0-62.sit.hqx
-rw-rw-rw-  1 999        321268 May 13 20:48 wais-ir12.ZU
-rw-rw-rw-  1 14         409388 Apr  5 00:44 wais-8-a11.tar.Z
-rw-rw-rw-  1 1637      1094536 Mar 28 00:37 WAIStation-0-62-Sources.sit.hqx
-rw-rw-rw-  1 14        1070714 Mar 23 01:24 WAIStation-0-61.sit.hqx
-rw-rw-rw-  1 14         475815 Mar 23 01:19 wais-8-a10.tar.Z
found wais ok
quake.think.com:/pub/wais/