bobm@rtech.UUCP (01/24/87)
(I sent out this article in net.sources.bugs while I was reading the discussion there. I forgot to cross-post to this group, as I had intended. Sorry for the duplication.) First of all, ispell is a slick program. I've been using a shell script dialogue tying together grope / spell / grep to do this for years. It's much nicer this way (my script built an edit script to run over the file as it went along - a real kludge). I had to fix the short -> int tpgrp bug in the signal handling to get it to run (on a pyramid). Then somebody tried the -l option and it crashed. Go through the "checkfile()" routine in ispell.c. There are a couple of "putc" calls that need to have "if (!lflag)" checks put on them to avoid output to an uninitialized file pointer. As I said, I really like the thing, but I've been making some mods, and have a few suggestions. mods: command line options to allow alternate dictionary files, alternate personal dictionary files, and allow characters other than alpha to be counted as "word" characters. Also, a shell variable to set for your personal dictionary rather than "ispell.words". The extra characters option has a provision for specifying full 8-bit characters, intended for international character set use. Actually, the reason I added the option was to be able to get &'s counted for things like "AT&T", and get underscores counted, as these critters have a habit of turning up in technical documents. It was easy to simply make the character check capable of handling 8 bit characters - array lengths can be 256 as easily as 128 - the way of specifying hyperascii on the command line is pretty crude, but it works. There are a few extra characters that cause odd things to happen through inability to put words containing them in the dictionary (slashes), collision with formatter syntax (periods, backslahes), or screwing up control (newlines, escapes, tabs), but the results should simply be odd but explicable actions as opposed to crashes. It's unlikely that you'd want to use them except to try to break the program. To make use of alternate dictionaries without doing a lot of file shuffling, buildhash also gets arguments to specify the input/output files. These are all pretty simple, and I will send out these mods soon. Something I am thinking about doing is enhancing the roff handling. What I have in mind is putting roff macros in the dictionary with a "." prefix, and using the flags to indicate special actions for that command. Then, when a formatter macro was found, the dictionary could be checked. If not there, do what's done now (simply ignore the token). Some things the flags could cause: ignore the whole line, not just the command. pick up nroff argument 1 as a file name, query the user, and append it to the list of files to be processed if desired. Used for the .so command, for instance. Ability to pick up an argument other than 1 may be useful for local macros, but gets more complicated. start ignoring text until you find a formatter command to turn it back on again. Useful for any roff macros / preprocessor commands that bracket stuff which isn't going to be english, probably. Macro definitions, macros intended to bracket code fragments and eqn spring to mind. Could be a variant of this allowing / disallowing nesting. ignore stuff until a line ending with a period - for tbl formatting commands. There may also be a use for coding special treatments for backslash sequences based on the character following the backslash. Also, ispell doesn't pick up use of the ' instead of . to begin a command, or recognize nroff comments. These are nits - 99.9% of the time people don't use such things anyway. Some thoughts: I was surprised to note how the personal dictionary was handled. If the hash table were restructured to allow insertion of new entries at runtime (use malloc to allocate new nodes, change word from an index to a pointer, etc), you could interpret and enter the personal dictionary into the hash table rather than using an auxiliary structure, except maybe keeping the results of "i" command entries to insert into the file. You would then have one access method instead of two for word lookup, and more important, you could have the slash codes on the words in your personal dictionary as well. Or your personal roff macros if I did what I suggested above. I haven't looked at this in detail, so I don't know if it's really feasible, or how much change is required. If done, another bonus is that you could combine dictionaries on the command line rather than having to duplicate basic stuff across multiple dictionaries to handle special classes of jargon. To really allow this thing to handle foreign languages, you would need different ending rules. It might be possible to devise abstractions to be coded into the dictionary stating what the substitution rules are for the various flags. I know this one would be a LOT of work. It's just a thought. Anyway, I LIKE it! Even if I haven't set things up to make it convenient to use it on my usenet articles yet. -- Bob McQueer {amdahl, sun, mtxinu, hoptoad, cpsc6a}!rtech!bobm
cudcv@warwick.UUCP (01/28/87)
In article <620@rtech.UUCP> bobm@rtech.UUCP (Bob Mcqueer) writes: > >To really allow this thing to handle foreign languages, you would need >different ending rules. It might be possible to devise abstractions >to be coded into the dictionary stating what the substitution rules are >for the various flags. I know this one would be a LOT of work. It's just >a thought. > >Bob McQueer >{amdahl, sun, mtxinu, hoptoad, cpsc6a}!rtech!bobm Anybody likely to teach it British English ? I like the program, now if only it could spell ... -- UUCP: ...!mcvax!ukc!warwick!cudcv PHONE: +44 203 523037 JANET: cudcv@uk.ac.warwick.daisy ARPA: cudcv@daisy.warwick.ac.uk Rob McMahon, Computing Services, Warwick University, Coventry CV4 7AL, England