[comp.ai] Wanted: on-line literature

mnr@DAISY.LEARNING.CS.CMU.EDU (Marc Ringuette) (10/16/89)

I need to find a source of large quantities (1 to 50 megabytes) of ASCII
text, with a strong preference towards literature.  Novels and essays are
great.  Newspaper or magazine articles are probably OK too.

Please send me Email at "mnr@cs.cmu.edu" if you have any ideas how I can find
such text accessible from the Internet.  Also let me know if I can include
your response in a summary to post to comp.ai in a week or so.

Thanks!

---  Marc Ringuette  ///  CMU CS Dept, Pittsburgh  ///  mnr@cs.cmu.edu

[ Why?  I need to run some statistics on word usage.  Actually, I'd also
  appreciate pointers to dictionaries of personal and place names, and
  dictionaries with annotations of noun/verb/etc.  Send mail! ]

mnr@daisy.learning.cs.cmu.edu (Marc Ringuette) (10/30/89)

Here is a summary of the responses I got to my request for on-line
literature for doing text analysis.  I haven't obtained any of the
texts yet, so I can't report any success stories.  Please bear in 
mind that these respondents may not expect a flood of requests...

------------------------------
Can you get access to a NeXT machine?  I did some similar experiments
some time ago, and found the complete works of Shakespeare to be a good
source of text...
------------------------------
Stanford just got what are basically online copies of UPI newswires.
The service is called ClariNet.  You might check to see if CMU has such
a service.
------------------------------
  Oxford has some sort of on-line text center, but I think they only
send out tapes.  They have an enormous selection of stuff on-line,
though.
  The Univ. of Waterloo is handling the Electronic Oxford English
Dictionary, which is on-line, and has embedded text.   
  Let me know if you want/need the email addresses.
		Othar Hansson (othar@ernie.berkeley.edu)
------------------------------
The Brown corpus is text on the order of hundreds of thousands of words,
with part of speech given too.  You might ask Michael.Witbrock@cs.cmu.edu.
------------------------------
This summer I worked for Bob Berwick at MIT on projects
that needed huge amounts of text. We had all of Sherlock
Holmes, all the articles from the WSJ for 1988, the
Oxford-English Dictionary, etc. Some of this stuff we
got from the MIT Media Lab. I know that all of it
contains special licensing restrictions, so he probably
can't just give you a copy. But he might be able
to tell you where you can get your own copy. So email
to either berwick@chomsky.mit.edu or brent@reagan.ai.mit.edu
(Michael Brent is the graduate student of Bob's who's doing
most of the work on it.)
--Mark Kantrowitz (mkant@cs.cmu.edu)
------------------------------
From: Susan Gauch <sgauch@wellesley.edu>
Try contacting Ed Fox at Virginia Polytechnic.  He has collections of large
amounts of text for distribution on optical disks.  It may be available
in other formats.  
Also, contact John Smith at U. North Carolina (jbs@cs.unc.edu).  He has
found the bible, plus several literary works which he has online.
------------------------------
From: "Edward A. Fox" <fox@fox.cs.vt.edu>
  We  often have isrmac1.cs.vt.edu up with a CDROM holding lots of
things you might be able to use.  Try anonymous FTP - I will try
to get people to keep that up.  [[ Not up as of 10/26/89...--Marc ]]
  Eventually the CDROMs will be released and that will be easier.
  There is also the ACL Data Collection Initiative which will
soon be releasing mag tapes with lots and lots of text - you can
ask to be on the list by contacting:
	myl%coma@research.att.com
------------------------------
You might be interested in contacting the Oxford Text Archive. They maintain 
a collection of works which they distribute (large texts on mag tape, alas...)
for a very small charge (basically production costs).  The archive also
coordinates with other text archives worldwide.
The archive has serviced my requests promptly and has treated me warmly.
As of 11/4/87, they may be reached at:
s-mail:  Oxford Text Archive
         Oxford University Computing Service
         13 Banbury Road
         Oxford  OX2 6NN,  UK
e-mail: archive@uk.ac.oxford.vax
phone:  +44 (865) 273238

There is also an effort within the Association for Computational Linguistics
to assemble a large collection of text (for example, the 'Data Collection
Initiative';coding texts using the SGML).  contact Donald Walker at
Bellcore (walker@flash.bellcore.com; 201-829-4312).
   -- John Klockner <7334016@UWAVM.ACS.WASHINGTON.EDU>
------------------------------

That's it.  Good luck!

--  Marc Ringuette  //  CMU CS Dept, Pittsburgh  //  Internet: mnr@cs.cmu.edu