ts@uwasa.fi (Timo Salmi LASK) (04/12/90)
Spelling checker update (/pc/ts)tspell23.arc at chyde.uwasa.fi, available by anynymous ftp. This update brings a very significant increase in the speed of the dictionary editor SPELLED. The increase is of an order of as much as five times the earlier speed. One telling weakness of SPELLED has been that updating the dictionary is slow. This is because the alphabetical order of the dictionary must be retained when new words are added, or old ones deleted. For each new word a slot must first be made by moving all the words above it up one notch. Since MsDos arrays are limited to 64K, this means very much load on the program, because a huge array system must be used within the program code (unseen by the user). Starting from release 1.9 of TSPELL I have used the huge array management in Turbo Professional by TurboPower Software, but now I have replaced it with my own huge array management code, which is very much faster. One of the reasons is that TurboPower's code is general. It is written for any variable type, and has no size limits except the available memory. The code I have now used is a double pointer system adapted for strings in particular, and in this technique the size of an array is limited to 16383 rows, and as many columns. In practice this means, that the maximum number of words that the dictionary may hold is 16383. At the face of it this seems a severe limitation, but, in fact, it is not. I have noticed that if one builds the dictionary for one's own purposes, a vocabulary of 10000-12000 words is quite enough. There is a huge amount of slack in the generic vocabularies, and still they often do not fulfil the user's specific needs. And the distributed PD version of the spelling checker is limited to 9000 words, anyway. A 22800 word version of SPELLER will still exist, but it is not (nor has been) released to Public Domain. There is also another, slight price to pay for the change of the technique. At the beginning of the program a brief spell (pardon the pun) is taken to build up the pointer system for the dynamic memory. I cannot be really sure, but it seems to me that old huge array technique may have been a source of an occasional crash of the system. In writing programs with pointer arrays I have noticed that if they are not controlled very accurately, they cause unexpected behavior. This is natural, since a "wild" pointer may point to anywhere in memory potentially changing it. I have made the same change of technique in the spelling checker. SPELLER was fast to begin with, and in just checking text the difference does not really matter, even if it is there. SPELLER was very fast already. But if you choose to update the dictionary immediately with new words as you proceed, then the difference is significant. If you choose to store new words as you check a text, they are written in a file called dict.tmp on the default device. If you break out of SPELLER (ctrl-c or break) the temporary file is now properly closed. Earlier it was left with zero length in case of a break-out. A few words about the original TSPELL philosophy. I made the spelling checker screen oriented on purpose. There are so many spelling checkers which make a list of the incorrectly spelled words (or rather words that aren't found in the dictionary), that I decided rather to complement them with the screen orientation than to write yet another conventional checker. If you use a Unix system, Unix spell command is a good example of a list oriented spelling checker. On a fast machine it is very nice to use. Searching Archive: TSPELL23.ARC - Checks English Spelling, T.Salmi Filename Comment Date Time -------- -------------------------------- ---- ---- AUXIL.DNY Seed (read the instructions) 08-11-88 10:09:54 SPELLED.DNY English dictionary 10-21-89 17:14:26 SPELLED.EXE Dictionary editor 04-12-90 08:41:26 SPELLER.EXE Spelling checker (start here) 04-12-90 08:47:28 SPELMERG.EXE Fast merging of dictionaries 04-07-90 19:42:18 TSPELL.INF Document 04-12-90 12:10:00 TSPROG.INF List of PD programs from T.Salmi 04-03-90 17:37:40 VAASA.INF Info: Finland, Vaasa, U of Vaasa 02-02-90 11:52:54 WORDLIST.EXE Counts frequencies of words 04-07-90 19:42:20 ---- ------ ------ ----- 0009 263671 157807 41% ................................................................... Prof. Timo Salmi (Moderating at anon. ftp site 128.214.12.3) School of Business Studies, University of Vaasa, SF-65101, Finland Internet: ts@chyde.uwasa.fi Funet: gado::salmi Bitnet: salmi@finfun