[comp.text] where to get machine readable text?

pbiron@weber.ucsd.edu (Paul Biron) (09/18/90)

I realize that most discussion in this group is about formating
text, however, I cannot find a more suitable group to post this
question to.  Sorry if this is a gross missuse of this group's
bandwidth!

I am looking for places (hopefully free, via ftp) to get machine
readable text

e.g.
	the Bible
	the Talmud (sp?)
	works of literature/philosophy (Shakespere, Aristotle, etc)
	political/social documents (Bill of Rights, etc)
	.
	.
	.

I am interested in such texts solely for research purposes and
will not be reselling them.

Thanx for any pointers,

Paul Biron      pbiron@ucsd.edu        (619) 534-5758
Central University Library, Mail Code C-075-R
Social Sciences DataBase Project
University of California, San Diego, La Jolla, Ca. 92093

av@kielo.uta.fi (Arto V. Viitanen) (09/18/90)

>>>>> On 18 Sep 90 01:35:13 GMT, pbiron@weber.ucsd.edu (Paul Biron) said:
Paul> Nntp-Posting-Host: weber.ucsd.edu

Paul> I realize that most discussion in this group is about formating
Paul> text, however, I cannot find a more suitable group to post this
Paul> question to.  Sorry if this is a gross missuse of this group's
Paul> bandwidth!

Paul> I am looking for places (hopefully free, via ftp) to get machine
Paul> readable text

Paul> e.g.
Paul> 	the Bible

In simtel20 there is King James version of the Bible plus search programs for
it in directory PD2:<MSDOS2.BIBLE>. Name for ftp is WSMR-SIMTEL20.ARMY.MIL
and address is  26.2.0.74.

--
Arto V. Viitanen				         email: av@kielo.uta.fi
University Of Tampere,				   	    av@ohdake.cs.uta.fi
Finland

emv@math.lsa.umich.edu (Edward Vielmetti) (09/18/90)

In article <2945@network.ucsd.edu> pbiron@weber.ucsd.edu (Paul Biron) writes:

   I realize that most discussion in this group is about formating
   text, however, I cannot find a more suitable group to post this
   question to.  Sorry if this is a gross missuse of this group's
   bandwidth!

I've x-posted into comp.text.sgml, which is relevant of sorts

   I am looking for places (hopefully free, via ftp) to get machine
   readable text (list deleted)

For anonymous FTP from the directory

	sgml.math.lsa.umich.edu:/pub/sgml/oxford-text-archive 

you will find a list of texts which are available (for a fee) from the
Oxford Text Archive and a description of how to get them.  Use of the
texts is generally limited to private scholarly research or when
authorized for teaching purposes, and there may be other and more
strict restrictions.  The list of available texts is pretty long.

Beyond that there's the Open Book Initiative led by Barry Shein,
I don't know what sort of a corpus he has available & how extensive
it is, and also something called Project Gutenberg.  I'll pass on
extensive descriptions of each cause I don't have them.

--Ed

Edward Vielmetti, U of Michigan math dept <emv@math.lsa.umich.edu>
moderator, comp.archives

rooks@unc.cs.unc.edu (Mark Rooks) (09/19/90)

In article <2945@network.ucsd.edu> pbiron@weber.ucsd.edu (Paul Biron) writes:

>I am looking for places (hopefully free, via ftp) to get machine
>readable text

>e.g.
>	the Bible
>	the Talmud (sp?)
>	works of literature/philosophy (Shakespere, Aristotle, etc)
>	political/social documents (Bill of Rights, etc)
>	.
>	.
>	.
>
>I am interested in such texts solely for research purposes and
>will not be reselling them.
>
>Thanx for any pointers,
>
>Paul Biron      pbiron@ucsd.edu        (619) 534-5758

Someone has already posted that the Bible is available through simtel.
You could also pick up a computer shopper and order a KJB for not much
more than the cost of the disks.

A few of the major text projects ongoing:

Oxford Text Archive: already mentioned by Ed. Some of the text little
more than a raw scan. Very spotty collection in the humanities. Not free,
though cheap.

ARTFL project: Massive collection of 16th through 20th cent. French M-R
texts; based at the Univ. of Chicago.

Renaissance Textbase project: U. of Toronto. What the name implies.

Elec. Text. Corp.: Published a CD (for only $250) last year with 50 MB of
American lit. (from the Lib. of America series), the Riverside Shakespeare,
a couple of Bibles, and a few other things.

Georgetown Univ.: Publishing translations of Hegel. Additional material in
the works.

Thesaurus Lingua Grecae (TLG): Virtually the whole of ancient Greek. Based
at UC-Irvine.

Thesaurus Lingua Latinae (TLL): Comparable project to TLG, but Latin. Based
in Germany.

Most of the major works in philosophy (through the 19th cent.) should be
available by this time next year.

Mark Rooks

ted@nmsu.edu (Ted Dunning) (09/19/90)

   Paul> I am looking for places (hopefully free, via ftp) to get machine
   Paul> readable text


if you are only interested in getting machine readable text, but not
necessarily interested in getting machine readable text of great
historical value, then you have come to the right place.

you are literally swimming in free text.

network news is one of the handiest sources of text known.  mega-words
and mega-words of it.  and if that isn't enough, just wait a week and
you will have that much again.

it is pretty much vernacular and content is usually missing, but it is
text. 

--
ted@nmsu.edu					+---------+
						| In this |
						|  style  |
						|__10/6___|

pbiron@weber.ucsd.edu (Paul Biron) (10/03/90)

A few weeks ago I posted a request for places to find
machine-readable text.  I have recieved many good leads
(thanx all) and was about to post a summary; however,
I just started to get another batch of responses.

So, I'll wait just a little while longer before
posting a summary so that I can check out these latest
replies.

Thanx,

Paul Biron      pbiron@ucsd.edu        (619) 534-5758
Central University Library, Mail Code C-075-R
Social Sciences DataBase Project
University of California, San Diego, La Jolla, Ca. 92093