[net.nlang] Webster's Second

jaw@ames.UUCP (James A. Woods) (03/05/85)

#  What are words worth?  -- The Tom Tom Club

   Proper words in proper places, make the true definition of a style.
	-- Jonathan Swift, Letter to a Young Clergyman, 1720

   Words butter no parsnips. -- Southern proverb
_____

     As promised long ago, I am making available the wordlist from Webster's
Second International Dictionary to those with 'ftp' access to ARPAnet.
The kind soul at Bell Labs who provided me with this word hoard maintains
that it is public domain.  The note inside the covers of Webster's Third
indicates a copyright date of 1934 for 'web2'; legal protection for 'web3'
began in 1961 and is still in effect.  Since dictionaries are living
entities, you be the judge of its efficacy -- we will have to wait for
the likes of Lawrence Urdang and Univ. of Toronto to finish input of the OED
for the (pen)ultimate word on the English language.

     Web2 is by far too large for 'uucp' transmission.  In fact, I have
encoded the files for ARPA xmission by a factor of four (to about one MB)
by using a combination of the ever-popular 'compress' program and a
specialized "incremental encoder" written in a few lines of C.  This
has been done in order to lighten the load on our gracious host (RIACS --
Research Institue for Advanced Computer Science), at the expense of
increased decoding time on the recipient machine.  This should all be
invisible to you, if you wish, since the procedure is simply:

	- login via "anonymous ftp" to riacs.ARPA
	- cd ~ftp/pub/web2
	- retrieve web2.shar, web2.sq.Z, and web2a.sq.Z
		
followed by installation with

	sh web2.shar
	make web2

which also makes 'compress' and 'unsqueeze' before turning over 2.4MB of
output to 'sort -f'.  If you think that this is also a ploy to get you to
install the second-generation 'compress' on your system, indeed it is
such.  This way, ARPAnauts can do some one-stop shopping.

     Web2a is a supplementary list of hyphenated terms as well as assorted
noun and adverbial phrases.  Web2 has already served me and others well
in conducting certain frivolous research into "word jazz".  Inquire within.

     -- James A. Woods   {ihnp4,hplabs}!ames!jaw    (or, jaw@riacs)

jaw@ames.UUCP (James A. Woods) (03/07/85)

#  The word "accident" should be erased from the dictionary.
	-- Isaac Bashevis Singer, 1978

     Loose ends:

	(1) The wordlist is just that, no definitions.  You'll
	    just have to wait until one of the copywronged databases
	    (Wang owns the rights for the Random House Dictionary)
	    shows up on one of those cute 550 MB digital CD ROMs.

	(2) It has been moved to the 'riacs-icarus.arpa' mail gateway
	    to soothe a swamped time-sharing CPU.  (Sorry, Dave!)

	(3) Four-to-one data compression is sub-optimal, I know.
	    (I used, in addition to the incremental decoder
	    [remember zippy.h, K.T.?], "compress -b12" so that
	    machines with tiny address space can unbutton the list.

	(4) Good luck if you use it as a spelling detector/corrector
	    list!  The pitfalls induced by this are nicely discussed
	    in M. D. McIlroy's "Development of a Spelling List", in
	    IEEE Trans. Commun., January 1982.

-- James A. Woods   {hplabs,ihnp4}!ames!jaw   (or jaw@riacs)

mwherman@watcgl.UUCP (Michael W. Herman) (03/07/85)

>                                        Since dictionaries are living
> entities, you be the judge of its efficacy -- we will have to wait for
> the likes of Lawrence Urdang and Univ. of Toronto to finish input of the OED
> for the (pen)ultimate word on the English language.

HOLD EVERYTHING.  It's the University of Waterloo that is undertaking the
New Oxford Dictionary Project; not that other university down the road.