jaw@ames.UUCP (James A. Woods) (03/05/85)
# What are words worth? -- The Tom Tom Club
Proper words in proper places, make the true definition of a style.
-- Jonathan Swift, Letter to a Young Clergyman, 1720
Words butter no parsnips. -- Southern proverb
_____
As promised long ago, I am making available the wordlist from Webster's
Second International Dictionary to those with 'ftp' access to ARPAnet.
The kind soul at Bell Labs who provided me with this word hoard maintains
that it is public domain. The note inside the covers of Webster's Third
indicates a copyright date of 1934 for 'web2'; legal protection for 'web3'
began in 1961 and is still in effect. Since dictionaries are living
entities, you be the judge of its efficacy -- we will have to wait for
the likes of Lawrence Urdang and Univ. of Toronto to finish input of the OED
for the (pen)ultimate word on the English language.
Web2 is by far too large for 'uucp' transmission. In fact, I have
encoded the files for ARPA xmission by a factor of four (to about one MB)
by using a combination of the ever-popular 'compress' program and a
specialized "incremental encoder" written in a few lines of C. This
has been done in order to lighten the load on our gracious host (RIACS --
Research Institue for Advanced Computer Science), at the expense of
increased decoding time on the recipient machine. This should all be
invisible to you, if you wish, since the procedure is simply:
- login via "anonymous ftp" to riacs.ARPA
- cd ~ftp/pub/web2
- retrieve web2.shar, web2.sq.Z, and web2a.sq.Z
followed by installation with
sh web2.shar
make web2
which also makes 'compress' and 'unsqueeze' before turning over 2.4MB of
output to 'sort -f'. If you think that this is also a ploy to get you to
install the second-generation 'compress' on your system, indeed it is
such. This way, ARPAnauts can do some one-stop shopping.
Web2a is a supplementary list of hyphenated terms as well as assorted
noun and adverbial phrases. Web2 has already served me and others well
in conducting certain frivolous research into "word jazz". Inquire within.
-- James A. Woods {ihnp4,hplabs}!ames!jaw (or, jaw@riacs)jaw@ames.UUCP (James A. Woods) (03/07/85)
# The word "accident" should be erased from the dictionary.
-- Isaac Bashevis Singer, 1978
Loose ends:
(1) The wordlist is just that, no definitions. You'll
just have to wait until one of the copywronged databases
(Wang owns the rights for the Random House Dictionary)
shows up on one of those cute 550 MB digital CD ROMs.
(2) It has been moved to the 'riacs-icarus.arpa' mail gateway
to soothe a swamped time-sharing CPU. (Sorry, Dave!)
(3) Four-to-one data compression is sub-optimal, I know.
(I used, in addition to the incremental decoder
[remember zippy.h, K.T.?], "compress -b12" so that
machines with tiny address space can unbutton the list.
(4) Good luck if you use it as a spelling detector/corrector
list! The pitfalls induced by this are nicely discussed
in M. D. McIlroy's "Development of a Spelling List", in
IEEE Trans. Commun., January 1982.
-- James A. Woods {hplabs,ihnp4}!ames!jaw (or jaw@riacs)mwherman@watcgl.UUCP (Michael W. Herman) (03/07/85)
> Since dictionaries are living > entities, you be the judge of its efficacy -- we will have to wait for > the likes of Lawrence Urdang and Univ. of Toronto to finish input of the OED > for the (pen)ultimate word on the English language. HOLD EVERYTHING. It's the University of Waterloo that is undertaking the New Oxford Dictionary Project; not that other university down the road.