emv@msen.com (Ed Vielmetti) (06/19/91)
in the near future, postings to comp.archives are going to be tagged with an explicit copyright notice. [*] this is a step in the direction of making this service fully self-supporting, with enough resources readily available to the project so that I can afford to keep it going. as i posted in an article to comp.archives a few months ago, unless my work on this starts to yield some results, i'm going to stop distributing my efforts far and wide for free. free distribution of comp.archives is currently expected to continue to the end of the year; if things haven't worked their way out to my satisfaction, i expect to step down from moderating comp.archives some time not too long after the winter Usenix meeting. no specific dates set yet. there are several very good reasons to stick a copyright notice of some kind on the materials which i have collected and organized for on the order of 18 months now. first and foremost, comp.archives needs some better publicity and name recognition. it's somewhat embarrassing to have people still asking how it's produced, that there's some sense that it's magic or just automatic processing that's going on. explicit copyrights will start to clue people in on just what it is they are looking at -- a production not only of some technology but also of considerable human creative input. explicit assertion of copyright will assist me in gaining cooperation from resource providers who might be interested in producing services which were derived from my efforts. these might be on-line searchable databases, services which offered direct hands-off delivery of successful searches by anonymous ftp or uucp transfer, caches of information dynamically updated from comp.archives postings, or paper or cd-rom products that would incorporate materials derived from comp.archives. in addition, it will enable my work to be properly credited by other researchers who are working on the "resource discovery" problem; rather than them simply saying "we searched through netnews for interesting stuff and found a lot of it, so our search stuff must be pretty good", i would expect proper credit and attribution and recognition of the substantial progress made thus far. i expect that in the same timeframe that postings to comp.archives will also be cross-posted to a new group, with the tentative name "msen.internet.archives". MSEN, Inc. will be the publisher of materials in the msen.* hierarchy; I expect to be doing this for the benefit of our operations and that of our customers and strategic partners, and these groups will be fed in accordance with that policy. I have developed a considerable amount of expertise in this area, and expect to populate the msen.* hierarchy with interesting, insightful, and consistently high quality information. Some people who really enjoy reading comp.archives right be cut off from it for some amount of time. I'm content for that to happen; for my own needs, I can do all of the filtering and searching and sorting on netnews and just hoard that knowlege all to myself. It would be much easier to do that rather than spend the extra time adding all of the extra information, verifying that thigs are really there, editing down really long posting etc. That's where too much time is spent right now, and where support from people using that information is going to help me assess whether it's worthwhile continuing. If you are currently building any services based on comp.archives (other than strictly personal use), please contact me and let me know what your plans are so that we can assure their continued viability on into 1992. if you considered building such services and rejected the notion, let me know what the limitations of the current data stream are and what you would like to see in the future. MSEN Inc., if it ever gets sufficiently successful to actually pay any of its current employees instead of draining their own personal bank accounts :-(, will be looking for skilled people to fill a position of Internet Archivist. (Save your resumes, at the current rate of progress it's a ways off.) So far as I can tell, none of the commercial internet providers as of yet have anyone filling this role; Cerfnet and ANS has nothing along this line, UUNET's generic title is "postmaster", and everyone at PSI is working on X.500. It would make me quite happy if when we finally got to the point of hiring for this position, all of the good people had been snapped up; not too likely as far as I can tell. I would also be happy to pursue joint development work with archivists to develop dictionaries or other classifiction schema and to further the state of the art in searching and text retrieval systems. I ran across an estimate that it costs all told $200 to put together a single complete Library of Congress card catalog entry. If you look at the sustained production of about a dozen entries daily in comp.archives and value it at this at this rate, that's an estimate of the potential value of this project at about $750,000 to $1M per year. I believe that MSEN could deliver this service extremely well with that sort of a budget, that it would be money well spent, and that it would be best for everyone involved if the end product wasn't burdened by any nasty copyright nonsense. Unfortunately, the current "Interim MSEN" plans don't have a large pile of money falling out of the sky, and the realities of doing this much work for nothing are starting to catch up on me. I hope very much that things will work out well, and if they don't, well it's been fun. -- Edward Vielmetti, vice president for research, MSEN Inc. emv@msen.com "With all of the attention and publicity focused on gigabit networks, not much notice has been given to small and largely unfunded research efforts which are studying innovative approaches for dealing with technical issues within the constraints of economic science." RFC 1216 [*] Pointers to materials available under the GNU Public License will of course be freely redistributable; MSEN will not assert any copyright or place any restrictions over their redistribution, and I'll continue to try to track GNU project announcements even if I give up free distribution of everything else.
schoff@PSI.COM ("Martin Lee Schoffstall") (06/27/91)
Ed, You write.... > So far as I can tell, none of the > commercial internet providers as of yet have anyone filling this role; > Cerfnet and ANS has nothing along this line, UUNET's generic title is > "postmaster", and everyone at PSI is working on X.500. Actually PSI is working on X.500, Z39.50v2, SNMP, and several research protocols... In your initial posting you took a shot at X.500, and in some sense it is implicit in this message also. While there are a number of negative aspects vis a vis X.500 it IS intended and IS being used to register all kinds of information today. Your obviously familiar with the WhitePages work that many people have worked on over the last three years. As with any new network protocol there is the chicken and egg problem, now that we have Mac applications that are available via anonymous FTP and soon MSDOS applications I'd like to believe that we're going to break out of our shell! What you may not be aware of is that X.500 is bound to information retreival (and of course information registration). Under DARPA R&D sponsorship PSI, along with a few other organizations, (Jon Postel can speak to "FOX") we have been exploring this. We have a tool which integrates X.500 and Anonmous FTP so that you can (today) explore the RFC hierarchy by author, title, etc, and then grab the document from various sites which hold the RFC's. The tool is called x5ftp and will be released no later then the end of the project (31dec91). And we're extending the model to deal with other things than RFC's.... But again X.500 is not the perfect protocol for information retreival, neither is Z39.50, and we won't even talk about MARC records! Recently due to a discussion in the FOX group we decided to issue the equivalent of a position paper which is titled "Towards Networked Information Retrieval", available via anonymous FTP from uu.psi.com in wp/nir.ms (troff, ms macros) wp/ps/nir.ms (Postscript) Take a look. I think your efforts are appreciated by many, I hope they are fruitful, others are working hard too, and believe that they are on a fruitful path too. A decade from now we MAY know who was right. Marty PS: There is some X.500/WhitePages information on-line which can be retreived by sending email to wp-info@psi.com, a NULL message will suffice, as it does an "auto-reply".
emv@msen.com (Ed Vielmetti) (06/28/91)
In article <9106261948.AA20843@psi.com> schoff@PSI.COM ("Martin Lee Schoffstall") writes:
We have a tool which integrates
X.500 and Anonmous FTP so that you can (today) explore the RFC hierarchy
by author, title, etc, and then grab the document from various sites
which hold the RFC's. The tool is called x5ftp and will be released
no later then the end of the project (31dec91).
And we're extending the model to deal with other things than RFC's....
Well, I'd have to say that dealing with RFC's is about as easy as they
come, and you'd better have a damn fine project when you're done or
I'll be quite disappointed. The texts are regular and structured,
there's a lot of boilerplate text which could be extracted out and
conclusions drawn from it, and there's a substantial amount of
"superstructure" in that RFCs reference other documents and there's a
strong sense of "this supersedes that, this modifies that, etc.".
It's a consistent, high quality, verified data stream, you should be
able to do a lot more with it than just browse author and title.
There is no "RFC Hierarchy"; the collection of RFCs is a complex,
tangled web of references, updates, improvements, discussions, and
ephemera. Attempts to impose a strict hierarchical structure on it
will fail to capture the richness of information in it.
Does your tool provide any way to search through the various sections
of the RFCs? For instance, modern RFCs all have a "security
considerations" section; can you browse through those looking for RFCs
which have extensive discussion? That would be valuable.
Does your tool provide any kind of similarity metrics or groupings
between the RFCs, so that (e.g.) RFC's 1064, 1176, and 1203 are
presented together (IMAP), with RFC 1223 not too far away (POP3) ? A
tool with proper browsing support would facilitate this kind of exchange.
Several RFCs reference materials which are available for anonymous FTP
from other sites; does your browser have direct support for (e.g.) the
NOCTOOLS catalog? A good system would let you point and click and get
the goods delivered back to your local machine.
A proper browser or filtering agent would have the ability to store
queries for later replay. If I find an RFC that I like, can I store
the query that found so that the next time an RFC (or internet draft)
is issued that's similar to it I will be notified?
An RFC tool would be a useful thing, but I don't have high hopes for
x5ftp, to the extent that X.500 is a gubbishy protocol for these kind
of searches and that you're constrained to use that technology.
... we decided to issue the equivalent
of a position paper which is titled "Towards Networked Information
Retrieval"...
Full citation below. A reasonably good paper, albeit wordy,
illustrating the defects in both X.500 and Z39.50 for information
retrieval; neither process seems adequate to handle the problem at
hand, though you could argue that any work in this area is progress
and should be supported. Notably missing from the paper is a mention
of Brewster Kahle's WAIS project (see below), which is an
implementation of Z39.50 that addresses some of the defects mentioned.
Edward Vielmetti, vice president for research, MSEN Inc. emv@msen.com
"(6) The Plan shall identify how agencies and departments can
collaborate to ... expand efforts to improve, document, and evaluate
unclassified public-domain software developed by federally-funded
researchers and other software, including federally-funded educational
and training software; "
"High-Performance Computing Act of 1991, S. 272"
-- MSEN Archive Service file verification
uu.psi.com
-r--r--r-- 1 dsadmin staff 50666 Jun 25 18:27 /wp/nir.ms
-r--r--r-- 1 dsadmin staff 56973 Jun 27 11:16 /wp/nir.txt
-r--r--r-- 1 dsadmin guest 117611 Jun 25 18:27 /wp/ps/nir.ps
found psi-networked-information-retrieval ok
uu.psi.com:/wp/{nir*,ps/nir*}
-- MSEN Archive Service file verification
quake.think.com
total 5561
drwxrwxrwx 2 14 1024 Jun 25 00:07 wais-discussion
-rw-rw-rw- 1 1637 635857 Jun 21 21:50 WAIStation-Canned-Demo.sit.hqx
-r--r--r-- 1 14 463981 Jun 13 20:44 wais-8-b1.tar.Z
-rw-rw-r-- 1 1556 475161 May 21 18:43 wais-8-a12-3.tar.Z
-rw-rw-rw- 1 1637 635225 May 16 03:01 WAIStation-0-62.sit.hqx
-rw-rw-rw- 1 999 321268 May 13 20:48 wais-ir12.ZU
-rw-rw-rw- 1 14 409388 Apr 5 00:44 wais-8-a11.tar.Z
-rw-rw-rw- 1 1637 1094536 Mar 28 00:37 WAIStation-0-62-Sources.sit.hqx
-rw-rw-rw- 1 14 1070714 Mar 23 01:24 WAIStation-0-61.sit.hqx
-rw-rw-rw- 1 14 475815 Mar 23 01:19 wais-8-a10.tar.Z
found wais ok
quake.think.com:/pub/wais/