[comp.archives.admin] Commercial Archives

laws@ai.sri.com (Kenneth I. Laws) (06/23/91)

'Scuse me, I'm new here.

I've been following with great interest Ed's discussion of
the volunteer-moderator problem, together with the cogent
comments of others about sharing the load, growing the net
into a real-world service, and giving people an incentive
to properly catalog their own submissions.

I'm curious about one issue: is the goal to create a single
(distributed) archive?  I can see some efficiency advantages
in avoiding duplicate storage, and some administrative
advantages in having a single indexing system, but I don't
see real-world analogies showing that this is the way to go.
The Library of Congress is a special case, and one could argue
for NASA engineering archives as a model.  But there is no
way that you can compete on a governmental scale.  Why are
you not aiming for separate archives (cross linked, of course)
for each of the different discussion topics?  Each would have
its own librarians, consultants, or priests, and each would
serve a fairly well-defined community.  Access from outside
the community would be by asking someone on the inside.

This is particularly pertinent if you wish to grow a
commercial service -- as I believe you should.  The existance
of a free service like comp.archives makes the next step
very difficult.  (In a like manner, Prof. John McCarthy claims
that the existance of the Arpanet eventually interfered with
commercial network development, leading to the current
revolutionary acceptance of a rather poor FAX standard.)
Beause the transition will be difficult, you will have to
pay very close attention to market forces and realistic
business principles.

The current comp.archives appears to be driven by "technology
push": you have the data available, so you're saving it.
Business doesn't work that way; it works by "pull."  You
have to find customers who need a specific type of data,
then you let them pay for the archiving, indexing, and
knowledgeable data experts.

As an extreme case, you can imagine a host of consultants,
each with his or her own archive.  Each consultant advertises
a specialty, collects related data, indexes it according to
personal needs, seeks out customers, prepares reports, and
occasionally even publishes a book.

Instead of following the consultant model, you seem to be
following the public library model.  Why?  There's no money
in it.

					-- Ken Laws

emv@msen.com (Ed Vielmetti) (06/24/91)

<excerpt>
   The current comp.archives appears to be driven by "technology
   push": you have the data available, so you're saving it.
   Business doesn't work that way; it works by "pull."  You
   have to find customers who need a specific type of data,
   then you let them pay for the archiving, indexing, and
   knowledgeable data experts.
</excerpt>

There's something that you're missing here, I think.  No doubt there's
some domain-specific knowledge involved in the production of
comp.archives; it's useful to have a feel for which of the 1000+
archive sites in the world have the greatest likelihood of having
current stuff, which authors are most reliable, who is best organized.

But there's more to it than that.  One of the fundamental technologies
involved is taking a piece of text and answering the question "Is this
interesting?", or more likely "Is this likely to be interesting to Ed
Vielmetti, or Chris Torek, or Mark Moraes, or Richard Stallman, or
Mitch Kapor?"  That's not an easy question, but if you can solve it
(for free) for the person involved, then you can instantly market what
you have to everyone else in the world who respects these people's opinions.

<excerpt>
   As an extreme case, you can imagine a host of consultants,
   each with his or her own archive.  Each consultant advertises
   a specialty, collects related data, indexes it according to
   personal needs, seeks out customers, prepares reports, and
   occasionally even publishes a book.
</excerpt>

That's a good model to follow, and I would hope to start following it.
One of the things that's going to be part of the <tm> MSEN Archive
Service </tm> which is not in comp.archives now is a further breakdown
by subject classification; you'll be able to subscribe to
"msen.archives.tex" and get just the latest and greatest on TeX
software announcements and reviews, or "msen.archives.x" to track the
progress of X11 stuff.  You'll particularly want the last one once
X11R5 rolls around.  Each of these collections will have its own
archivist, who is responsible for quality control and additional
research. 

I'm planning to apply the same technology to related fields as well,
subject to the availablility of some copyrighted information (and the
time and investment to pull it off).  For instance, an <tm> MSEN Patent
Watch </tm> subscription would get you news of patent filings,
cross-license agreements, technical information (and raw speculation)
on the viability and challengability of <kw> software patents </kw>,
etc, culled from every available source and tagged (by experts) with
an assessment of quality and value.  I'd bet that this on could even
make a go for itself on paper.

<excerpt>
   Instead of following the consultant model, you seem to be
   following the public library model.  Why?  There's no money
   in it.
<excerpt>

One of the problems with the consultant model is that it doesn't scale
too well; you have to do all of the development yourself, and it's
hard to find like-minded people because you're hoarding all of your
efforts.  By pursuing a strategy that includes some component of
public service / pro bono / for the good of the net, and by
aggressively tracking Internet standards (like the multipart,
multimedia "richmail" spec), it's possible to get a substantial amount
of goodwill, and perhaps enough visibility for people to take you
seriously. 

After all, this sort of thing is very old, it's just a high tech
"clipping service".  It's something that I would do <o>just for
myself</o> except that that hasn't been lucrative enough to buy the
necessary hardware and software I'd need to store all of the
interesting things I find, or to license the necessary rights to the
copyrighted newsfeeds (let alone have anything left over for me) .  It
doesn't matter if there's "no money in it", so long as the venture is
self-supporting and sustainable.

<sig>
Edward Vielmetti, vice president for research, MSEN Inc. emv@msen.com
"MSEN Archive Service" and "MSEN Patent Watch" are trademarks of MSEN, Inc.
<snappy-quote>
On the Net, the Net-way is best.
	It's just that we are trying to figure out what the Net-way is.
						e. miya
</snappy-quote>
</sig>

<comment>
Markup information provided for use by news readers which implement
the experimental "Mechanisms for Specifying and Describing Internet
Message Bodies", available for anonymous ftp from 
	<msen-archive-information>
	<site>thumper.bellcore.com</site>
	<directory>/pub/nsb</directory>
	</msen-archive-information>
This text has been marked up in the hopes that someone will be able to
print it out on paper and make it pretty!  A five dollar reward goes
to the first nice paper copy.  Send submissions to
<snail>
	Edward Vielmetti
	MSEN, Inc.
	317 S. Division, Suite 218
	Ann Arbor, MI 48104-2203
	USA
</snail>
<markup>
<kw> key words </kw>
<o> emphasis </o>
<tm> trademark </tm>
<sig> signature </sig>
<snail> paper mail ("snail mail") address </snail>
<snappy-quote> when in doubt, quote an RFC. </snappy-quote>
<msgid> message id </msgid>
<from> from </from>
<excerpt> 
	<msgid> LAWS.91Jun22223423@sunset.ai.sri.com </msgid>
	<from> laws@ai.sri.com (Kenneth I. Laws) </from>
</excerpt>
</markup>
</comment>

rodney@sun.ipl.rpi.edu (Rodney Peck II) (06/24/91)

In article <LAWS.91Jun22223423@sunset.ai.sri.com> laws@ai.sri.com (Kenneth I. Laws) writes:
>(... Prof. John McCarthy claims
>that the existance of the Arpanet eventually interfered with
>commercial network development, leading to the current
>revolutionary acceptance of a rather poor FAX standard.)

I think Prof. John McCarthy is making an awful lot of assumptions.  FAXs and
the internet are not all that closely related.  Maybe if you want to make
some sort of argument that the internet had stalled commercial development
of telephone switching networks and their digital side, you might have
something.  Then again, you probably wouldn't since the internet (including
the global portions) is extremely small compared to the phone switching
networks.

>Instead of following the consultant model, you seem to be
>following the public library model.  Why?  There's no money
>in it.

because there's more to life than money.

Comp.archives seemed to me to be a project that developed as a Neat Thing
that was useful to many people, not a way for some people to get rich.

-- 
Rodney

cmf851@anu.oz.au (Albert Langer) (06/25/91)

In article <EMV.91Jun23144034@bronte.aa.ox.com> emv@msen.com (Ed Vielmetti) 
writes (many things related to an interesting dicsussion I don't have time to
participate in, so I'm just responding on the occasional side issue):

>Markup information provided for use by news readers which implement
>the experimental "Mechanisms for Specifying and Describing Internet
>Message Bodies", available for anonymous ftp from 

The markup appears to be based on SGML (Standard Generalized Markup
Language, which has an ISO standard and is indeed suitable for maintaining
both text databases and revisable form rich text documents via news).

However if a suitable SGML document type HAS been defined for your
purposes then you ought to publish it and reference it as a public
text. Then you can use a MUCH less verbose (but equally readable)
notation - e.g. omitting or shortening most of the end markers and
making use of various abbreviations and typist techniques.

--
Opinions disclaimed (Authoritative answer from opinion server)
Header reply address wrong. Use cmf851@csc2.anu.edu.au

emv@msen.com (Ed Vielmetti) (06/25/91)

<par> 
As far as it is feasible the IETF "richmail" project is being
pushed to use as simple a subset of SGML as possible so that people
can type it in by hand and not have it distract too much from the
actual text. 
</par>
<excerpt>
   in article 1991Jun24.193928.21180@newshost.anu.edu.au 
   cmf851@anu.oz.au (Albert Langer) writes:   

   However if a suitable SGML document type HAS been defined for your
   purposes then you ought to publish it and reference it as a public
   text. Then you can use a MUCH less verbose (but equally readable)
   notation - e.g. omitting or shortening most of the end markers and
   making use of various abbreviations and typist techniques.
</excerpt>
<par>
There's good reasons not to use the SGML minimization rules, not the
least of which is to minimize the amount of work that "dumb" user
agents have to do to strip out the formatting information.  To quote
from the internet draft --
<excerpt>
            NOTE ON THE RELATIONSHIP OF RICHTEXT TO SGML:   Richtext  is
            decidedly  not  SGML,  and  should  not be used to transport
            arbitrary SGML  documents.   Those  who  wish  to  use  SGML
            document  types  as  a mail transport format should define a
            new text-plus subtype,  e.g.  "text-plus/sgml-dtd-whatever".
            Richtext  is  designed  to  be  compatible  with  SGML,  and
            specifically so  that  it  will  be  possible  to  define  a
            richtext  DTD  if  that  is  desired. However, this does not
            imply that arbitrary SGML can be called richtext,  nor  that
            richtext  implementors have any need to understand SGML; the
            description  in  this  memo  is  a  complete  definition  of
            richtext.						
</excerpt>
The approach of avoiding the complicated minimization rules
facilitates treatment of the text by more general systems, such as
Open Text System's PAT, which can be taught to recognize very simple
tagging schemes but which don't have facilities for disambiguating
whether a minimized end-tag matches one or more begin-tags.  I also
hope to have a system built in GNU Emacs, and while the richtext
scheme seems easy enough with it I don't have any intention of hacking
full-blown SGML in emacs.
</par>
<par>
As an extreme example, all of the markup in this document is one tag
per line, which is extremely easy to wipe out with even with grep -v.
</par>
<sig>
Edward Vielmetti, vice president for research, MSEN Inc. emv@msen.com
<snappy-quote>
By the way, Ed, I think you may be the first person in the history of
the world to successfully send a multifont email message to someone who
wasn't using the same software with which the message was composed.
Congratulations!	nsb@thumper.bellcore.com
</snappy-quote>
</sig>