[net.misc] Looking for a list of English words

aark (09/08/82)

Can anyone in Netland help me?  I'm looking for a file that contains
a list of English words.  Ideally, the file would have been created
by typing in the entire text of a reasonably good English dictionary,
then deleting all the definitions, pronunciations, etymologies, etc.,
leaving just a list of words.  I don't care if the words are sorted,
if they're each on a separate line, etc.  Does anyone have a file that
even remotely resembles what I've just described?  If so, please send
me mail telling me how I can get ahold of it.

What I'm interested in doing is generating a matrix of transition
probabilities from one letter to another in English words.  That is,
given the letter 'a' in a word, what is the probability that the
next letter is 'a', 'b', 'c', etc.?  Once one has this matrix, one
can, for example, simulate a semi-intelligent monkey at a typewriter.
The monkey types letters at random, but is subject to the transition
probability matrix.  (I call it the Markov Monkey.)

Thanks for any help anyone can give me.

Alan Kaminsky
...ihps3!ihuxv!aark

mcewan (09/10/82)

#R:ihuxv:-29700:uiucdcs:10600012:000:316
uiucdcs!mcewan    Sep 10 13:28:00 1982


A simple list of words does not take into accound the probability of a
particular word occurring. I would think that examples of English
text would be better. If you're serious about the Markov Monkey, you
would also need the transition probabilities for spaces and puncuation,
which only the text would give you.

mp (09/13/82)

We have a copy of Webster's 2nd, without definitions.  It's too
big to uucp (2.5 megabytes, 235000 words), but if you can get
to the Arpanet, it's in WORDS.DAT[C370DD00] on CMU-A.
It's a great list for hangman games!
	Mark (anywhereexceptalice!physics!mitccc!mp)