[sci.lang] Unix-like spelling checker TSCHEK11.ARC update

ts@uwasa.fi (Timo Salmi) (04/07/91)

TSCHEK11.ARC    Unix-like spell by Timo Salmi
Filename        Comment                             Date      Time
--------        --------------------------------    ----      ----
SPELL.EXE       Unix-like spelling checker        04-06-91  22:49:40
SPELLED3.DNY    Dictionary                        04-06-91  22:25:16
SPL.BAT         A simple batch to drive spell     03-30-91  11:13:18
TSCHEK.NWS      News announcements about tschek   04-06-91  22:31:58
TSPROG.INF      List of PD programs from T.Salmi  03-30-91  10:23:20
VAASA.INF       Info: Finland, Vaasa, U of Vaasa  02-02-90  11:52:54
----            ------             ------  -----
0006            158708              75810   53%

Sat 6-Apr-91: I have updated my Unix-like spelling checker which
lists those words of your textfile which are not found in the
accompanying dictionary. It is now available from our archives as
/pc/ts/tschek11.arc. Some of the new features of the update:
 - The program is about one third faster.
 - There are more words in the accompanying ascii dictionary.
 - Overly long rows in your text files are only skipped.  The
   reading is not terminated as in the previous version.
 - Elapsed run time is given at the end of the execution.
 - Switch /b (batch) turns off the header and the footer.
 - The double empty line on the screen when directing the output to
   a file has been corrected.
Spell has been programmed with Turbo Pascal 5.0. Checking spelling
involves tasks that are very time consuming, and much careful code
writing is required to make a spelling checker fast. These include
the procedures for making a list of the different words of the file
to be checked, sorting it, and comparing it the words in the
dictionary file. I have gradually improved on these routines through
many versions of another, screen oriented spelling checker of mine.
Nevertheless, I used Turbo profiler to find further bottle-necks of
the code, and could improve on two of the critical parts in the
present code. As an example checking the spelling of a 150 page text
with a 13000 word dictionary now takes under two minutes on my 386.
   Although I am not a linguist, let me explain about dictionaries
based on practical experience. A dictionary of 10000-20000 words
might seem small to you, but it is not. When you write, the number
of _different_ words is quite small. Try out some long text of yours
with a word frequency counter (you'll find one in tspell24.arc). The
figure you get will probably be a surprise to you. This means that
with a good selection of words in your dictionary adapted to your
own special field and writing style need not be large. The
dictionary accompanying tscheck11.arc is inclined towards a computer
user's terminology, business economics, mathematics, statistics, and
my own writing style.
   Even if I also enjoy writing programs, this one has risen solely
from my own practical needs.  I wanted a method for a quick and
easy-to-use checking of the spelling of my ascii files on a PC.

The wares are available by anonymous ftp from garbo.uwasa.fi, Vaasa,
Finland, 128.214.12.37, or by using our mail server (use the latter
if, and only if you don't have anonymous ftp).  If you are not
familiar with anonymous ftp or mail servers, I am prepared to send
prerecorded instructions on request.  (If you don't get the
instructions from me within a few days, it will mean that your email
address cannot be reached by a simple email reply.  Contact your
system manager for devicing a proper mail path for you, because
unless you do, you wouldn't be able to utilize the mail server
anyway.  If you are in North America first consider using an ftp
site near you to spare the overseas load.)

...................................................................
Prof. Timo Salmi        
Moderating at garbo.uwasa.fi anonymous ftp archives 128.214.12.37
School of Business Studies, University of Vaasa, SF-65101, Finland
Internet: ts@chyde.uwasa.fi Funet: gado::salmi Bitnet: salmi@finfun