[comp.sys.apple] Spell-Checker Dictionary Decrypting

q4kx@vax5.cit.cornell.edu (12/01/89)

Dear All,
        Is there any ** legal ** way to decrypt dictionaries from various
spell-checkers?  I am looking for a way to get a file filled with a large
number of words (50,000 - 80,000 should be quite sufficient).  I have
a feeling that various companies would not like me to just steal their
encryption/decryption method though.  I have the following dictionaries
on hand:

Sensible Speller Dictionaries 1 & 2
Time Out Quickspell Dictionary
Appleworks GS Dictionary (But I don't want the definitions)
Pinpoint Spell Checker Dictionary

All are encrypted by some method (for speed I assume).  Sensible speller
has an option to list words in the dictionary that match some sort of
input.  If you hit "=", it will give you a list of all words in the
dictionary but it will unfortunately only display it to the screen.

Does anyone suppose that any of these companies would be willing to
somehow get me a copy of an un-encrypted dictionary?  Typing in a few
thousand words does not exactly excite me.

I realize that this will take up a large amount of disk space but it should 
all fit on an 800K disk.  ( avg 8 chars/word * 80,000 words = 640,000 chars)

Any ideas?

Joel Sumner        | Bitnet: Q4KX@CORNELLA
GENIE: JOEL.SUMNER | Internet: q4kx@cornella.ccs.cornell.edu
-------------------| Here: q4kx@vax5.cit.cornell.edu
Assumption is the mother of all screw-ups.

saa33413@uxa.cso.uiuc.edu (12/03/89)

I have Sensible Speller IV and AppleWorks 3.0.  I don't know how the AW
dictionary goes together (probably the same as TO QuickSpell, since that is
basically what AW 3.0 uses for its built-in speller), but I have taken a
look at the Sensible Speller dictionaries.  If I remember right, they aren't
really encrypted.  Imagine the disk as one 140K sequential text file.  It
starts with track 0, sector 0 (I assume you're using DOS 3.3 for this) and
goes through each sector in ascending order, then through each track.  Words
in the file end with the sign bit set opposite of whatever it is in the rest
of the word.

Hope this helps.

------------------------------------------------------------------------------
! Scott Alfter                        !      Keep the Chief--Dump Simon      !
!                                     !--------------------------------------!
!Internet:  saa33413@uxa.cso.uiuc.edu ! Note that my Bitnet address will     !
!  Bitnet:  free0066@uiucvme.bitnet   ! change, effective 20 Dec 89.  If you !
!           (until 20 Dec 89; after   ! want to be sure to avoid trouble,    !
!            20 Dec, send mail to     ! use my Internet address:             !
!            free0066@uiucvmd.bitnet) ! saa33413@uxa.cso.uiuc.edu            !
------------------------------------------------------------------------------

spike@world.std.com (Joe Ilacqua) (12/04/89)

In article <1059.257667b7@vax5.cit.cornell.edu> q4kx@vax5.cit.cornell.edu writes:
<        Is there any ** legal ** way to decrypt dictionaries from various
<spell-checkers?  I am looking for a way to get a file filled with a large
<number of words (50,000 - 80,000 should be quite sufficient).

	Do you have to use the spell-checker's dictionaries?  If you
have a UNIX system is a dictionary used by the spell checker.  The one
this system it is in /usr/dict/words and contains ~25,000.  There have
been dictionaries posted to the net in the past, you might find one if
you poke around some archive sites.

	Or you can make your own.  Collect some large on-line texts:
You could use someone's thesis, the man pages on a UNIX system, the
RFCs off of uunet, the Apple Tech Notes, Moby Dick, Wuthering Heights,
The Bible, (We have all of the later three on-line here).  Run them
through something that sorts and makes the words unique (sort(1) will
work on UNIX).

	Of course if you want all of your words to be correctly
spelled english words you will have to do some editing (for example
you will get a lot of function names from the man pages and tech
notes).  But if you just need words....

->Spike
-- 
"The World" - Public Access Unix - +1 617-739-9753  24hrs {3,12,24}00bps