ts@uwasa.fi (Timo Salmi) (04/07/91)
TSCHEK11.ARC Unix-like spell by Timo Salmi Filename Comment Date Time -------- -------------------------------- ---- ---- SPELL.EXE Unix-like spelling checker 04-06-91 22:49:40 SPELLED3.DNY Dictionary 04-06-91 22:25:16 SPL.BAT A simple batch to drive spell 03-30-91 11:13:18 TSCHEK.NWS News announcements about tschek 04-06-91 22:31:58 TSPROG.INF List of PD programs from T.Salmi 03-30-91 10:23:20 VAASA.INF Info: Finland, Vaasa, U of Vaasa 02-02-90 11:52:54 ---- ------ ------ ----- 0006 158708 75810 53% Sat 6-Apr-91: I have updated my Unix-like spelling checker which lists those words of your textfile which are not found in the accompanying dictionary. It is now available from our archives as /pc/ts/tschek11.arc. Some of the new features of the update: - The program is about one third faster. - There are more words in the accompanying ascii dictionary. - Overly long rows in your text files are only skipped. The reading is not terminated as in the previous version. - Elapsed run time is given at the end of the execution. - Switch /b (batch) turns off the header and the footer. - The double empty line on the screen when directing the output to a file has been corrected. Spell has been programmed with Turbo Pascal 5.0. Checking spelling involves tasks that are very time consuming, and much careful code writing is required to make a spelling checker fast. These include the procedures for making a list of the different words of the file to be checked, sorting it, and comparing it the words in the dictionary file. I have gradually improved on these routines through many versions of another, screen oriented spelling checker of mine. Nevertheless, I used Turbo profiler to find further bottle-necks of the code, and could improve on two of the critical parts in the present code. As an example checking the spelling of a 150 page text with a 13000 word dictionary now takes under two minutes on my 386. Although I am not a linguist, let me explain about dictionaries based on practical experience. A dictionary of 10000-20000 words might seem small to you, but it is not. When you write, the number of _different_ words is quite small. Try out some long text of yours with a word frequency counter (you'll find one in tspell24.arc). The figure you get will probably be a surprise to you. This means that with a good selection of words in your dictionary adapted to your own special field and writing style need not be large. The dictionary accompanying tscheck11.arc is inclined towards a computer user's terminology, business economics, mathematics, statistics, and my own writing style. Even if I also enjoy writing programs, this one has risen solely from my own practical needs. I wanted a method for a quick and easy-to-use checking of the spelling of my ascii files on a PC. The wares are available by anonymous ftp from garbo.uwasa.fi, Vaasa, Finland, 128.214.12.37, or by using our mail server (use the latter if, and only if you don't have anonymous ftp). If you are not familiar with anonymous ftp or mail servers, I am prepared to send prerecorded instructions on request. (If you don't get the instructions from me within a few days, it will mean that your email address cannot be reached by a simple email reply. Contact your system manager for devicing a proper mail path for you, because unless you do, you wouldn't be able to utilize the mail server anyway. If you are in North America first consider using an ftp site near you to spare the overseas load.) ................................................................... Prof. Timo Salmi Moderating at garbo.uwasa.fi anonymous ftp archives 128.214.12.37 School of Business Studies, University of Vaasa, SF-65101, Finland Internet: ts@chyde.uwasa.fi Funet: gado::salmi Bitnet: salmi@finfun