[sci.lang] TSPELL23.ARC spelling checker

ts@uwasa.fi (Timo Salmi LASK) (04/12/90)

  Spelling checker update (/pc/ts)tspell23.arc at chyde.uwasa.fi,
available by anynymous ftp. 
  This update brings a very significant increase in the speed of the
dictionary editor SPELLED. The increase is of an order of as much as five
times the earlier speed. One telling weakness of SPELLED has been that
updating the dictionary is slow. This is because the alphabetical order of
the dictionary must be retained when new words are added, or old ones
deleted. For each new word a slot must first be made by moving all the words
above it up one notch. Since MsDos arrays are limited to 64K, this means very
much load on the program, because a huge array system must be used within the
program code (unseen by the user). Starting from release 1.9 of TSPELL I have
used the huge array management in Turbo Professional by TurboPower Software,
but now I have replaced it with my own huge array management code, which is
very much faster. One of the reasons is that TurboPower's code is general. It
is written for any variable type, and has no size limits except the available
memory. The code I have now used is a double pointer system adapted for
strings in particular, and in this technique the size of an array is limited
to 16383 rows, and as many columns. In practice this means, that the maximum
number of words that the dictionary may hold is 16383. At the face of it this
seems a severe limitation, but, in fact, it is not. I have noticed that if
one builds the dictionary for one's own purposes, a vocabulary of 10000-12000
words is quite enough. There is a huge amount of slack in the generic
vocabularies, and still they often do not fulfil the user's specific needs.
And the distributed PD version of the spelling checker is limited to 9000
words, anyway. A 22800 word version of SPELLER will still exist, but it is
not (nor has been) released to Public Domain.
  There is also another, slight price to pay for the change of the technique.
At the beginning of the program a brief spell (pardon the pun) is taken to
build up the pointer system for the dynamic memory.
  I cannot be really sure, but it seems to me that old huge array technique
may have been a source of an occasional crash of the system. In writing
programs with pointer arrays I have noticed that if they are not controlled
very accurately, they cause unexpected behavior. This is natural, since a
"wild" pointer may point to anywhere in memory potentially changing it.
  I have made the same change of technique in the spelling checker. SPELLER
was fast to begin with, and in just checking text the difference does not
really matter, even if it is there. SPELLER was very fast already. But if you
choose to update the dictionary immediately with new words as you proceed,
then the difference is significant. If you choose to store new words as you
check a text, they are written in a file called dict.tmp on the default
device.
  If you break out of SPELLER (ctrl-c or break) the temporary file is now
properly closed. Earlier it was left with zero length in case of a break-out.
  A few words about the original TSPELL philosophy. I made the spelling
checker screen oriented on purpose. There are so many spelling checkers which
make a list of the incorrectly spelled words (or rather words that aren't
found in the dictionary), that I decided rather to complement them with the
screen orientation than to write yet another conventional checker. If you use
a Unix system, Unix spell command is a good example of a list oriented
spelling checker. On a fast machine it is very nice to use.

Searching Archive: TSPELL23.ARC - Checks English Spelling, T.Salmi
Filename        Comment                             Date      Time  
--------        --------------------------------    ----      ----  
AUXIL.DNY       Seed (read the instructions)      08-11-88  10:09:54
SPELLED.DNY     English dictionary                10-21-89  17:14:26
SPELLED.EXE     Dictionary editor                 04-12-90  08:41:26
SPELLER.EXE     Spelling checker (start here)     04-12-90  08:47:28
SPELMERG.EXE    Fast merging of dictionaries      04-07-90  19:42:18
TSPELL.INF      Document                          04-12-90  12:10:00
TSPROG.INF      List of PD programs from T.Salmi  04-03-90  17:37:40
VAASA.INF       Info: Finland, Vaasa, U of Vaasa  02-02-90  11:52:54
WORDLIST.EXE    Counts frequencies of words       04-07-90  19:42:20
----            ------             ------  -----
0009            263671             157807   41%

...................................................................
Prof. Timo Salmi        (Moderating at anon. ftp site 128.214.12.3)
School of Business Studies, University of Vaasa, SF-65101, Finland
Internet: ts@chyde.uwasa.fi Funet: gado::salmi Bitnet: salmi@finfun