allbery@ncoast.UUCP (05/31/87)
[Since this is a beta release, it is here. The final version will show up in comp.sources.unix with the other good stuff. ++bsa] #! /bin/sh # This is a shell archive, meaning: # 1. Remove everything above the #! /bin/sh line. # 2. Save the resulting text in a file. # 3. Execute the file with /bin/sh (not csh) to create the files: # UPDATE2 # Makefile # Othercap.txt # README # UPDATE # WISHES # buildhash.c # config.X # fixdict.sh # This archive created: Sat May 30 17:13:25 1987 export PATH; PATH=/bin:$PATH echo shar: extracting "'UPDATE2'" '(6964 characters)' if test -f 'UPDATE2' then echo shar: will not over-write existing file "'UPDATE2'" else sed 's/^X //' << \SHAR_EOF > 'UPDATE2' X This is the beta-test release of ispell version 2.0. As I discussed in X a previous comp.sources.d posting, I will collect bug fixes for this version, X and then post a final version with dictionary to mod.sources, at which time X I will wash my hands of the bloody thing. X X Because I am short on time, I can only promise to integrate bug fixes. If X you send me improvements, they will very likely disappear into a black hole. X Sorry, but it takes time to integrate every change, even the ones I can't test X because they're for BSD. If you plan on hacking extensively, I'd suggest X waiting for the mod.sources posting, or you may have to repeat some work. X X Send bug reports/fixes to: X X Geoff Kuenning geoff@ITcorp.com {uunet,trwrb}!desint!geoff X --------------------------------------------------------------------------- X INSTRUCTIONS: X X In response to many requests, this posting contains all sources except the X dictionary. Since shar won't overwrite files and some file names have X changed, you should unshar it in an empty directory. If you installed my X previous posting, you may also want to remove expand[12].sed from X /usr/public/lib, since these scripts have been renamed to to isexp[1-4].sed. X X Once you have unpacked, edit "Makefile" and "config.X" according to the X comments in each. Note that the Makefile edits "config.X" further to X produce "config.h". Then type "make install" and go away for a while X (if you're brave and foolish; otherwise do the equivalent more carefully). X X If you don't already have a dictionary, please don't ask me for one. Ask X a neighbor. If they don't have one, and you can't make one from X /usr/dict/words or /usr/dict/web2 by running it through "munchlist", X try running a bunch of text files through "makedict.sh". (It depends on X UNIX spell, though in a pinch you can do without if your source files are X very good). If all else fails, you'll just have to wait for the mod.sources X posting. X X If you do have a dictionary, and you would like to use the new CAPITALIZE X feature, you will have to convert you dictionary. If you have UNIX spell, X the "fixdict.sh" script will do this for you, without violating any X copyrights or license restrictions. This script replaces the current X dictionary, and writes a (short) list of questionable capitalizations X to standard output; these must be analyzed and, if necessary, corrected X by hand. The file "Othercap.txt" (included in this posting) contains X words that are in dict.191 which will be missed by "fixdict.sh" with X the standard UNIX spell program. X X Problems fixed in this posting: X X (1) Ispell did not duplicate the permissions on the files it edited. X (David Neves) X (2) The actual maximum number of possible corrections was 99, not 100. X (3) Ispell assumed a terminal width of 80 columns, rather than X consulting the termcap entry. X (4) Long lines could wrap around on the terminal, damaging the X display. X (5) The includes of types.h and param.h need to be interchanged on X BSD systems. (Ken Yap, Jacob Gore) X (6) The givehelp() routine now actually waits for a space to be typed X like it claims, instead of just waiting for any character. (Steve X Kelem) X (7) Good.c was missing a declaration of the index() (strchr) routine. X (8) The excessive strlen() calls in good.c have been removed, and X register declarations have been added. (Joe Orost, Rich Salz) X (9) Some systems get "multiple symbol definition" messages when X linking (Joe Orost). X (10) Expand[12].sed didn't handle new-format dictionaries. X (11) Some minor errors in the usage message have been corrected. X (12) If a space (or other non-word character) is inserted using "R", X ispell would treat the entire replacement string as a token X and try to find it in the dictionary. X (13) Ispell now follows the proper UNIX procedure for signal catching X (i.e., it doesn't catch SIGINT if it's run in background). X (14) The handling of process stopping on BSD systems has been cleaned X up and made to work right (Mark Davies). X X Improvements added in this posting: X X (1) Ispell's handling of troff size and font requests has again been X improved. (Isaac Balbin, Steve Kelem, Joe Orost) (Everybody seems X to fix the particular problem that bothers their individual world :-). X (2) If ispell is run on a file with an extension of ".tex", it will X automatically go into TeX mode for this and subsequent files. X (Steve Kelem) X (3) The emacs support now includes "ispell-buffer", and ispell is run X from "ispell-program-name" so you can specify an explicit path. X (Stewart Clamen) X (4) There is a TERM_MODE configuration option so you can choose between RAW X and CBREAK modes. The default has been changed to CBREAK (it used X to be RAW) to preserve parity. (Joe Orost) X (5) Term.c will now compile on V7 systems (Joe Orost) X (6) Register declarations have been added throughout. (Joe Orost) X (7) Ispell now buffers stdout, improving display performance slightly. X (8) The backup file extension is now configurable (George Sipe). X (9) All config.X definitions except MAGIC can be overridden with -D X switches (George Sipe). X (10) There is now a version.h file, so you will know what version you X have (I guess Larry Wall deserves credit. Even though he didn't X harass me, guilt set in). There is also a -v switch to print the X version information. X (11) (This was a lot tougher that I expected). Ispell now knows about X capitalization and proper names (yay). It recognizes four flavors X of words: lowercase, capitalized, all-capitals, and "followcase". X If a word appears in the dictionary in lowercase, it is accepted X in lowercase, capitalized, or all-capitals. If it is capitalized X in the dictionary, all-lowercase is disallowed. If it is all-caps X in the dictionary, it must always appear in all caps. Finally, X if the word has "weird" capitalization (like the name of my company, X ITcorp or ITCorp), either that capitalization must be followed X *exactly* or else the word must appear in all-caps. More than X one of these variants may occur; "munchlist" will remove unneeded X ones from a dictionary. Finally, if you blow capitalization, X ispell will offer a list of correctly-capitalized alternatives. X Because it increases the size of the hash file, this feature is X optional (see the CAPITALIZE option in config.X). X (12) A new shell script ("fixdict.sh") is provided to aid in converting X your old dictionary to provide capitalization information. X (13) Buildhash now pads the string table to a "struct dent" boundary X in the hash file, so that it will be aligned when reading in. On X many machines, this will speed startup. X (14) The -w option now accepts characters specified in octal with X backslashes like any other UNIX program, as well as the previous X decimal option, and it will also accept numeric strings of less X than three digits. X (15) The ispell.el file now supports ispell-region and ispell-buffer. SHAR_EOF fi # end of overwriting check echo shar: extracting "'Makefile'" '(2252 characters)' if test -f 'Makefile' then echo shar: will not over-write existing file "'Makefile'" else sed 's/^X //' << \SHAR_EOF > 'Makefile' X # -*- Mode: Text -*- X X # Look over config.X before building. X # X # You may want to edit BINDIR, LIBDIR, DEFHASH, DEFDICT, MAN1DIR, MAN4DIR X # MAN1EXT, MAN4EXT, and TERMLIB below; X # the Makefile will update all other files to match. X # X # On USG systems, add -DUSG to CFLAGS. X # X # The ifdef NO8BIT may be used if 8 bit extended text characters X # cause problems, or you simply don't wish to allow the feature. X # X # the argument syntax for buildhash to make alternate dictionary files X # is simply: X # X # buildhash <infile> <outfile> X X CC = lcc -v -HL -HD -R tgetflag X CFLAGS = -n -O -DUSG X # BINDIR, LIBDIR, DEFHASH, DEFDICT, MAN1DIR, MAN4DIR, MAN1EXT, MAN4EXT, X # TERMLIB X BINDIR = /usr/sahbin X LIBDIR = /tmp2/lib X DEFHASH = ispell.hash X DEFDICT = dict.191 X MAN1DIR = /usr/man/u_man/man1 X MAN4DIR = /usr/man/u_man/man4 X MAN1EXT = .1l X MAN4EXT = .4l X # TERMLIB = -lcurses X TERMLIB = -ltermcap X X SHELL = /bin/sh X X all: buildhash ispell icombine munchlist isexpand $(DEFHASH) X X ispell.hash: buildhash $(DEFDICT) X ./buildhash $(DEFDICT) $(DEFHASH) X X install: all X cp ispell isexpand munchlist $(BINDIR) X cp ispell.hash $(LIBDIR)/$(DEFHASH) X cp expand1.sed expand2.sed icombine $(LIBDIR) X chmod 755 $(BINDIR)/ispell $(BINDIR)/munchlist $(BINDIR)/isexpand \ X $(LIBDIR)/icombine X chmod 644 $(LIBDIR)/$(DEFHASH) $(LIBDIR)/expand1.sed \ X $(LIBDIR)/expand2.sed X cp ispell.1 $(MAN1DIR)/ispell$(MAN1EXT) X cp ispell.4 $(MAN4DIR)/ispell$(MAN4EXT) X X buildhash: buildhash.o hash.o X $(CC) $(CFLAGS) -o buildhash buildhash.o hash.o X X icombine: icombine.c config.h ispell.h X $(CC) $(CFLAGS) -o icombine icombine.c X X munchlist: munchlist.X Makefile X sed -e 's@!!LIBDIR!!@$(LIBDIR)@' -e 's@!!DEFDICT!!@$(DEFDICT)@' \ X <munchlist.X >munchlist X chmod +x munchlist X X isexpand: isexpand.X Makefile X sed -e 's@!!LIBDIR!!@$(LIBDIR)@' isexpand.X >isexpand X chmod +x isexpand X X OBJS=ispell.o term.o good.o lookup.o hash.o tree.o xgets.o X X ispell: $(OBJS) X cc $(CFLAGS) -o ispell $(OBJS) $(TERMLIB) X X $(OBJS) buildhash.o: config.h ispell.h X ispell.o: version.h X X config.h: config.X Makefile X sed -e 's@!!LIBDIR!!@$(LIBDIR)@' -e 's@!!DEFDICT!!@$(DEFDICT)@' \ X -e 's@!!DEFHASH!!@$(DEFHASH)@' <config.X >config.h X X clean: X rm -f *.o buildhash ispell core a.out mon.out hash.out \ X *.stat *.cnt munchlist config.h SHAR_EOF fi # end of overwriting check echo shar: extracting "'Othercap.txt'" '(106 characters)' if test -f 'Othercap.txt' then echo shar: will not over-write existing file "'Othercap.txt'" else sed 's/^X //' << \SHAR_EOF > 'Othercap.txt' X Airedale X Alcibiades X Argo X Argos X Arianist X Arianists X Auckland X CDR X Ethernet X Ethernet's X Ethernets X MIT's X Sikkim SHAR_EOF fi # end of overwriting check echo shar: extracting "'README'" '(6256 characters)' if test -f 'README' then echo shar: will not over-write existing file "'README'" else sed 's/^X //' << \SHAR_EOF > 'README' X -*- Mode:Text -*- X X Ispell consists of two programs: the actual spelling checker, "ispell", X and the hash table builder, "buildhash". Everything is set up so you X can just say "make install" and have everything happen. You might want X to edit the makefile, and ispell.h to change the destination of the X program and the hash table. X X The dictionary comes from the ITS spell dictionary. I got it from X "ml:wba;dict 191", although I don't know that this is the copy currenty X in use on the 20's around MIT. X X ---------------------------------------------------------------------- X X Addendum: X X My eternal gratitude to the author of ispell -- I don't know how I X ever lived without it. I received his permission to post ispell to X the net and have added a GNU EMACS interface. Look in the file X ispell.el for installation instructions. X X As far as I know, no one informally "supports" this program. If you X would like to "adopt" it (collect fixes/enhancements and post a new X version periodically), feel free to do so. X X I volunteer to collect dictionary diffs and post a composite diff X periodically. If you add a lot of words to the main dictionary, send X me the diffs between the the modified dictionary and the posted one. X Also, if you have access to a TOPS20 system with a more complete X dictionary in ispell format, send me the diffs if you can. Just X PLEASE don't dump an entire dictionary to our site! X X The dictionary posted is one I snarfed from around here -- after X comparison with the one originally supplied, ours appears a tad more X complete and accurate. X X Walt Buehring X Texas Instruments - Computer Science Center X X ARPA: Buehring%TI-CSL@CSNet-Relay X UUCP: {smu, texsun, im4u, rice} ! ti-csl ! buehring X X ---------------------------------------------------------------------- X X The following is the only documentation I could find about the format X of the dictionary. It was written for the TOPS20 speller that ispell X mimics, so I believe most the information is applicable. It should be X useful if you want to add words to the main dictionary by hand. -WB X X ---------------------------------------------------------------------- X X 11.6 Dictionary flags X X Words in SPELL's main dictionary (but not the other dictionaries) may X have flags associated with them to indicate the legality of suffixes X without the need to keep the full suffixed words in the dictionary. The X flags have "names" consisting of single letters. Their meaning is as X follows: X X Let # and @ be "variables" that can stand for any letter. Upper case X letters are constants. "..." stands for any string of zero or more X letters, but note that no word may exist in the dictionary which is not at X least 2 letters long, so, for example, FLY may not be produced by placing X the "Y" flag on "F". Also, no flag is effective unless the word that it X creates is at least 4 letters long, so, for example, WED may not be X produced by placing the "D" flag on "WE". X X "V" flag: X ...E --> ...IVE as in CREATE --> CREATIVE X if # .ne. E, ...# --> ...#IVE as in PREVENT --> PREVENTIVE X X "N" flag: X ...E --> ...ION as in CREATE --> CREATION X ...Y --> ...ICATION as in MULTIPLY --> MULTIPLICATION X if # .ne. E or Y, ...# --> ...#EN as in FALL --> FALLEN X X "X" flag: X ...E --> ...IONS as in CREATE --> CREATIONS X ...Y --> ...ICATIONS as in MULTIPLY --> MULTIPLICATIONS X if # .ne. E or Y, ...# --> ...#ENS as in WEAK --> WEAKENS X X "H" flag: X ...Y --> ...IETH as in TWENTY --> TWENTIETH X if # .ne. Y, ...# --> ...#TH as in HUNDRED --> HUNDREDTH X X "Y" FLAG: X ... --> ...LY as in QUICK --> QUICKLY X X "G" FLAG: X ...E --> ...ING as in FILE --> FILING X if # .ne. E, ...# --> ...#ING as in CROSS --> CROSSING X X "J" FLAG" X ...E --> ...INGS as in FILE --> FILINGS X if # .ne. E, ...# --> ...#INGS as in CROSS --> CROSSINGS X X "D" FLAG: X ...E --> ...ED as in CREATE --> CREATED X if @ .ne. A, E, I, O, or U, X ...@Y --> ...@IED as in IMPLY --> IMPLIED X if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U) X ...@# --> ...@#ED as in CROSS --> CROSSED X or CONVEY --> CONVEYED X "T" FLAG: X ...E --> ...EST as in LATE --> LATEST X if @ .ne. A, E, I, O, or U, X ...@Y --> ...@IEST as in DIRTY --> DIRTIEST X if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U) X ...@# --> ...@#EST as in SMALL --> SMALLEST X or GRAY --> GRAYEST X X "R" FLAG: X ...E --> ...ER as in SKATE --> SKATER X if @ .ne. A, E, I, O, or U, X ...@Y --> ...@IER as in MULTIPLY --> MULTIPLIER X if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U) X ...@# --> ...@#ER as in BUILD --> BUILDER X or CONVEY --> CONVEYER X X "Z FLAG: X ...E --> ...ERS as in SKATE --> SKATERS X if @ .ne. A, E, I, O, or U, X ...@Y --> ...@IERS as in MULTIPLY --> MULTIPLIERS X if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U) X ...@# --> ...@#ERS as in BUILD --> BUILDERS X or SLAY --> SLAYERS X X "S" FLAG: X if @ .ne. A, E, I, O, or U, X ...@Y --> ...@IES as in IMPLY --> IMPLIES X if # .eq. S, X, Z, or H, X ...# --> ...#ES as in FIX --> FIXES X if # .ne. S, X, Z, H, or Y, or (# = Y and @ = A, E, I, O, or U) X ...@# --> ...@#S as in BAT --> BATS X or CONVEY --> CONVEYS X X "P" FLAG: X if @ .ne. A, E, I, O, or U, X ...@Y --> ...@INESS as in CLOUDY --> CLOUDINESS X if # .ne. Y, or @ = A, E, I, O, or U, X ...@# --> ...@#NESS as in LATE --> LATENESS X or GRAY --> GRAYNESS X X "M" FLAG: X ... --> ...'S as in DOG --> DOG'S X X ---------------------------------------------------------------------- X X [Whew! That's all very nice, but how about a quick reference... -WB] X X V - ive X N - ion, tion, en X X - ions, ications, ens X H - th, ieth X Y - ly X G - ing X J - ings X D - ed X T - est X R - er X Z - ers X S - s, es, ies X P - ness, iness X M - 's SHAR_EOF fi # end of overwriting check echo shar: extracting "'UPDATE'" '(5252 characters)' if test -f 'UPDATE' then echo shar: will not over-write existing file "'UPDATE'" else sed 's/^X //' << \SHAR_EOF > 'UPDATE' X Ispell enhancements - 3/13/87 X X (See three companion postings in net.sources.bugs). X X Here are the enhancements to ispell that I mentioned a couple of days ago. X Because of the number of changes, several of the context diff's are bigger X than the original files. In addition, many people have gotten confused X about versions, since enhancements/fixes have been made by six different X people, counting myself (for the list, see the end of ispell.man). I X have integrated all of these fixes and enhancements in one place. X X For these reasons, I have decided to repost all of the sources for ispell, X with one exception -- the dictionary. (A couple of small files, such X as ispell.el, are unchanged, but I decided to repost them any for X completeness. If you didn't have ispell before, you now need only the X dictionary). X X The dictionary is a special case: if you think about it, even ordinary X diff's will always work with "patch" on that each-line-is-unique file. X An out-of-place insertion can be corrected by sorting the dictionary X after patching (something that is done anyway as a side effect of the X new "munchlist" script). Because of this, I have decided not to repost X the sizable dictionary. In the process of testing this code, it occurred X to me to run dict.191 through UNIX "spell"; the results of that are X given in three companion postings in net.sources.bugs, which seemed X like a more appropriate place for the diffs. (The postings are not X divided because of their size; see comments in the postings for my X reasons). X X Now, here's what I've done: X X In ispell itself: X X - The personal dictionary is now hashed, just like the main one, and X supports suffixes just like the main one. (It's not actually X integrated with the main one, because expanding the main one X is inefficient and poses a minor but troublesome technical X problem). A personal dictionary of 28000+ words can be read in X within a few minutes (hey, nobody's perfect -- whatcha doing X with such a big dictionary anyway? :-). X - New option "-c" is used by the new munchlist script to generate X suggested root/suffix combinations. X - The -d option can now specify /dev/null, if you want to use X only your personal dictionary (this also saves startup time X with -c, and is used by the "munchlist" script, which is why X I put it in). X - The -p option is now more flexible about its handling of pathnames. X An absolute pathname is always interpreted literally. A X relative pathname from WORDLIST is looked up in $HOME first, X then in the current directory. The -p option behaves in the X reverse fashion: current directory first, then $HOME. This X behavior seems more intuitive to me; I'd be interested in X opinions of others if you don't find it intuitive. X - Perhaps most important, I have completely overhauled the logic X in good.c, so that it (I think) matches what the README file X says it should, no more, no less. The code has been extensively X tested, notably by interaction with the new expansion scripts; X nevertheless because of the extent of the changes and the X nature of the logic, I'd suggest a bit of suspicion for a while. X A technique we've found useful here is to do your normal work X with ispell, and then do a final check with UNIX spell or some X other slow, inconvenient program to make sure ispell didn't X screw up. X X New scripts: X X - expand.awk: an obsolete (but correct) awk script that does X the same thing as expand[12].sed, except slower. The awk X script is also much easier to understand than the sed scripts. X Superseded by the sed scripts, except for very short input. X - expand[12].sed: the sed pipe X X "sed -f expand1.sed $file | sed -f expand2.sed" X X where "$file" is a raw dictionary file with suffixes X (e.g., dict.191), generates a list of each root alone, plus X the root expanded with each possible suffix (e.g., X "BOTH/R/Z" produces "BOTH", "BOTHER", and "BOTHERS"). The X output should usually be sorted with the -u switch before X further processing. These scripts are used by 'munchlist'; X they are also useful for (a) checking an ispell dictionary X with some other spell-checking program and (b) figuring X out what a particular suffix does to a certain word without X reading the README file. X - munchlist.sh: a slow, but effective, shell script that takes X lists of expanded or unexpanded words as input and reduces X them to a (usually smaller) list of roots and suffixes. The X result is written to standard output. I think the documentation X forgot to mention the input must be one word per line. I X have successfully used this script to combine dict.191 with X /usr/dict/words; it's also useful (and a lot faster) on X private dictionaries. For private dictionaries. it will also X remove any word that has since been added to the main dictionary. X X Oh yes, I almost forgot: the original documentation didn't mention X that ispell is a long-name program. If your "File:" display on the X top line actually contains the misspelled word, you have long-name problems. X My fixes don't address long names, because I finally have a way to X compile long-name programs, thanks to "hash8". X X Geoff Kuenning X geoff@ITcorp.COM X ...!trwrb!desint!geoff SHAR_EOF fi # end of overwriting check echo shar: extracting "'WISHES'" '(2317 characters)' if test -f 'WISHES' then echo shar: will not over-write existing file "'WISHES'" else sed 's/^X //' << \SHAR_EOF > 'WISHES' X Things remaining to be done to ispell: X X - The "munchlist" script can actually increase the size of a X dictionary. For example, munching dict.191 (after my bug fixes X to it) reduced the number of words by about 40, but increased X the number of characters by a small percentage. This is X because munchlist doesn't recognize duplicate suffixes that X generate the same result, except for the three special X "s-ending" suffixes J, Z, and X. For example, right now X munchlist will make BATHER by adding the R suffix to both X BATH and BATHE. It would be nice if munchlist could recognize X the redundancy and reduce its output so that each word was made X in only one way. X - The characters in the -w option should be written to the header X of the hash file, and to a header in the personal dictionary, X so the user doesn't have to remember to specify them every time. X - Buildhash should support the -w option. X - Buildhash, munchlist, icombine, and the expand scripts should use X a character other than slash for the flag separator, so that slashes X are available to the -w option. I tend to lean towards commas. X - It might be nice to support multiple personal dictionaries. On X the other hand, it's pretty easy to combine them with "cat". X - Good.c should be table-driven, so that it is easier to modify for X other languages. Ideally, it would support prefixes as well. X - A small amount of string space could be saved if buildhash would X combine strings with common suffixes (e.g., "and" could be stored X as a pointer to the tail of "bland"). X - (Peter Wan) Ispell should have a "server mode" for large sites, to X get rid of the time needed to read in the dictionary. On System V, X this could be accomplished by having the first execution of ispell X read the dictionary into a shared-memory region. Later incarnations X would then get the dictionary by just attaching to the region. X One problem would be that the dictionary gets modified during X the run, so you might still have to do a memory-to-memory copy X after the attach. The size of having two copies of the dictionary X might prohibit this on many machines. Another approach is a X message-based "good.c server", but this too would have to deal X with the possibility of modifiying the dictionary. SHAR_EOF fi # end of overwriting check echo shar: extracting "'buildhash.c'" '(10320 characters)' if test -f 'buildhash.c' then echo shar: will not over-write existing file "'buildhash.c'" else sed 's/^X //' << \SHAR_EOF > 'buildhash.c' X /* -*- Mode: Text -*- */ X X #define MAIN X X /* X * buildhash.c - make a hash table for ispell X * X * Pace Willisson, 1983 X */ X X #include <ctype.h> X #include <stdio.h> X #ifdef USG X #include <sys/types.h> X #endif X #include <sys/param.h> X #include <sys/stat.h> X #include "config.h" X #include "ispell.h" X X #define NSTAT 100 X struct stat dstat, cstat; X X int numwords, hashsize; X X char *malloc(); X char *realloc (); X X struct dent *hashtbl; X X char *Dfile; X char *Hfile; X X char Cfile[MAXPATHLEN]; X char Sfile[MAXPATHLEN]; X X main (argc,argv) X int argc; X char **argv; X { X FILE *countf; X FILE *statf; X int stats[NSTAT]; X int i; X X if (argc > 1) { X ++argv; X Dfile = *argv; X if (argc > 2) { X ++argv; X Hfile = *argv; X } X else X Hfile = DEFHASH; X } X else { X Dfile = DEFDICT; X Hfile = DEFHASH; X } X X sprintf(Cfile,"%s.cnt",Dfile); X sprintf(Sfile,"%s.stat",Dfile); X X if (stat (Dfile, &dstat) < 0) { X fprintf (stderr, "No dictionary (%s)\n", Dfile); X exit (1); X } X X if (stat (Cfile, &cstat) < 0 || dstat.st_mtime > cstat.st_mtime) X newcount (); X X if ((countf = fopen (Cfile, "r")) == NULL) { X fprintf (stderr, "No count file\n"); X exit (1); X } X numwords = 0; X fscanf (countf, "%d", &numwords); X fclose (countf); X if (numwords == 0) { X fprintf (stderr, "Bad count file\n"); X exit (1); X } X hashsize = numwords; X readdict (); X X if ((statf = fopen (Sfile, "w")) == NULL) { X fprintf (stderr, "Can't create %s\n", Sfile); X exit (1); X } X X for (i = 0; i < NSTAT; i++) X stats[i] = 0; X for (i = 0; i < hashsize; i++) { X struct dent *dp; X int j; X if (hashtbl[i].used == 0) { X stats[0]++; X } else { X for (j = 1, dp = &hashtbl[i]; dp->next != NULL; j++, dp = dp->next) X ; X if (j >= NSTAT) X j = NSTAT - 1; X stats[j]++; X } X } X for (i = 0; i < NSTAT; i++) X fprintf (statf, "%d: %d\n", i, stats[i]); X fclose (statf); X X filltable (); X X output (); X exit(0); X } X X output () X { X FILE *outfile; X struct hashheader hashheader; X int strptr, n, i; X X if ((outfile = fopen (Hfile, "w")) == NULL) { X fprintf (stderr, "can't create %s\n",Hfile); X return; X } X hashheader.magic = MAGIC; X hashheader.stringsize = 0; X hashheader.tblsize = hashsize; X fwrite (&hashheader, sizeof hashheader, 1, outfile); X strptr = 0; X for (i = 0; i < hashsize; i++) { X n = strlen (hashtbl[i].word) + 1; X #ifdef CAPITALIZE X if (hashtbl[i].followcase) X n += (hashtbl[i].word[n] & 0xFF) * (n + 1) + 1; X #endif X fwrite (hashtbl[i].word, n, 1, outfile); X hashtbl[i].word = (char *)strptr; X strptr += n; X } X /* Pad file to a struct dent boundary for efficiency. */ X n = (strptr + sizeof hashheader) % sizeof (struct dent); X if (n != 0) { X n = sizeof (struct dent) - n; X strptr += n; X while (--n >= 0) X putc ('\0', outfile); X } X for (i = 0; i < hashsize; i++) { X if (hashtbl[i].next != 0) { X int x; X x = hashtbl[i].next - hashtbl; X hashtbl[i].next = (struct dent *)x; X } else { X hashtbl[i].next = (struct dent *)-1; X } X } X fwrite (hashtbl, sizeof (struct dent), hashsize, outfile); X hashheader.stringsize = strptr; X rewind (outfile); X fwrite (&hashheader, sizeof hashheader, 1, outfile); X fclose (outfile); X } X X filltable () X { X struct dent *freepointer, *nextword, *dp; X int i; X X for (freepointer = hashtbl; freepointer->used; freepointer++) X ; X for (nextword = hashtbl, i = numwords; i != 0; nextword++, i--) { X if (nextword->used == 0) { X continue; X } X if (nextword->next == NULL) { X continue; X } X if (nextword->next >= hashtbl && nextword->next < hashtbl + hashsize) { X continue; X } X dp = nextword; X while (dp->next) { X if (freepointer > hashtbl + hashsize) { X fprintf (stderr, "table overflow\n"); X getchar (); X break; X } X *freepointer = *(dp->next); X dp->next = freepointer; X dp = freepointer; X X while (freepointer->used) X freepointer++; X } X } X } X X X readdict () X { X struct dent d; X register struct dent *dp; X char lbuf[100]; X FILE *dictf; X int i; X int h; X int len; X register char *p; X X if ((dictf = fopen (Dfile, "r")) == NULL) { X fprintf (stderr, "Can't open dictionary\n"); X exit (1); X } X X hashtbl = (struct dent *) calloc (numwords, sizeof (struct dent)); X if (hashtbl == NULL) { X fprintf (stderr, "couldn't allocate hash table\n"); X exit (1); X } X X i = 0; X while (fgets (lbuf, sizeof lbuf, dictf) != NULL) { X if ((i & 1023) == 0) { X printf ("%d ", i); X fflush (stdout); X } X i++; X X p = &lbuf [ strlen (lbuf) - 1 ]; X if (*p == '\n') X *p = 0; X X if (makedent (lbuf, &d) < 0) X continue; X X len = strlen (lbuf); X #ifdef CAPITALIZE X if (d.followcase) X d.word = malloc (2 * len + 4); X else X d.word = malloc (len + 1); X #endif X if (d.word == NULL) { X fprintf (stderr, "couldn't allocate space for word %s\n", lbuf); X exit (1); X } X strcpy (d.word, lbuf); X #ifdef CAPITALIZE X if (d.followcase) { X p = d.word + len + 1; X *p++ = 1; /* Count of capitalizations */ X *p++ = '-'; /* Don't keep in pers dict */ X strcpy (p, lbuf); X X } X for (p = d.word; *p; p++) { X if (mylower (*p)) X *p = toupper (*p); X } X #endif X X h = hash (d.word, len, hashsize); X X dp = &hashtbl[h]; X if (dp->used == 0) { X *dp = d; X } else { X X #ifdef CAPITALIZE X while (dp != NULL && strcmp (dp->word, d.word) != 0) X dp = dp->next; X if (dp != NULL) { X if (d.followcase X || (dp->followcase && !d.allcaps X && !d.capitalize)) { X /* Add a specific capitalization */ X if (dp->followcase) { X p = &dp->word[len + 1]; X (*p)++; /* Bump counter */ X dp->word = realloc (dp->word, X ((*p & 0xFF) + 1) * (len + 2)); X if (dp->word == NULL) { X fprintf (stderr, X "couldn't allocate space for word %s\n", X lbuf); X exit (1); X } X p = &dp->word[len + 1]; X p += ((*p & 0xFF) - 1) * (len + 2) + 1; X *p++ = '-'; X strcpy (p, X d.followcase ? &d.word[len + 3] : lbuf); X } X else { X /* d.followcase must be true */ X /* thus, d.capitalize and d.allcaps are */ X /* clear */ X free (dp->word); X dp->word = d.word; X dp->followcase = 1; X dp->k_followcase = 1; X /* Code later will clear dp->allcaps. */ X } X } X /* Combine two capitalizations. If d was */ X /* allcaps, dp remains unchanged */ X if (d.allcaps == 0) { X /* dp is the entry that will be kept. If */ X /* dp is followcase, the capitalize flag */ X /* reflects whether capitalization "may" */ X /* occur. If not, it reflects whether it */ X /* "must" occur. */ X if (d.capitalize) { /* ie lbuf was cap'd */ X if (dp->followcase) X dp->capitalize = 1; /* May */ X else if (dp->allcaps) /* ie not lcase */ X dp->capitalize = 1; /* Must */ X } X else { /* lbuf was followc or all-lc */ X if (!dp->followcase) X dp->capitalize == 0; /* May */ X } X dp->k_capitalize == dp->capitalize; X dp->allcaps = 0; X dp->k_allcaps = 0; X } X } X else { X #endif X dp = (struct dent *) malloc (sizeof (struct dent)); X if (dp == NULL) { X fprintf (stderr, X "couldn't allocate space for collision\n"); X exit (1); X } X *dp = d; X dp->next = hashtbl[h].next; X hashtbl[h].next = dp; X } X } X } X printf ("\n"); X } X X /* X * fill in the flags in d, and put a null after the word in s X */ X X makedent (lbuf, d) X char *lbuf; X struct dent *d; X { X char *p, *index(); X X d->next = NULL; X d->used = 1; X d->v_flag = 0; X d->n_flag = 0; X d->x_flag = 0; X d->h_flag = 0; X d->y_flag = 0; X d->g_flag = 0; X d->j_flag = 0; X d->d_flag = 0; X d->t_flag = 0; X d->r_flag = 0; X d->z_flag = 0; X d->s_flag = 0; X d->p_flag = 0; X d->m_flag = 0; X d->keep = 0; X #ifdef CAPITALIZE X d->allcaps = 0; X d->capitalize = 0; X d->followcase = 0; X /* X ** Figure out the capitalization rules from the capitalization of X ** the sample entry. Only one of followcase, allcaps, and capitalize X ** will be set. Combinations are generated by higher-level code. X */ X for (p = lbuf; *p && *p != '/'; p++) { X if (mylower (*p)) X break; X } X if (*p == '\0' || *p == '/') X d->allcaps = 1; X else { X for ( ; *p && *p != '/'; p++) { X if (myupper (*p)) X break; X } X if (*p == '\0' || *p == '/') { X /* X ** No uppercase letters follow the lowercase ones. X ** If the first two letters are capitalized, it's X ** "followcase". If the first one is capitalized, it's X ** "capitalize". X */ X if (myupper (lbuf[0])) { X if (myupper (lbuf[1])) X d->followcase = 1; X else X d->capitalize = 1; X } X } X else X d->followcase = 1; /* .../lower/upper */ X } X d->k_allcaps = d->allcaps ; X d->k_capitalize = d->capitalize; X d->k_followcase = d->followcase; X #endif X X p = index (lbuf, '/'); X if (p != NULL) X *p = 0; X if (strlen (lbuf) > WORDLEN - 1) { X printf ("%s: word too big\n", lbuf); X return (-1); X } X X if (p == NULL) X return (0); X X p++; X while (*p != '\0' && *p != '\n') { X if (mylower (*p)) X *p = toupper (*p); X switch (*p) { X case 'V': d->v_flag = 1; break; X case 'N': d->n_flag = 1; break; X case 'X': d->x_flag = 1; break; X case 'H': d->h_flag = 1; break; X case 'Y': d->y_flag = 1; break; X case 'G': d->g_flag = 1; break; X case 'J': d->j_flag = 1; break; X case 'D': d->d_flag = 1; break; X case 'T': d->t_flag = 1; break; X case 'R': d->r_flag = 1; break; X case 'Z': d->z_flag = 1; break; X case 'S': d->s_flag = 1; break; X case 'P': d->p_flag = 1; break; X case 'M': d->m_flag = 1; break; X case 0: X fprintf (stderr, "no flags on word %s\n", lbuf); X continue; X default: X fprintf (stderr, "unknown flag %c word %s\n", X *p, lbuf); X break; X } X p++; X if (*p == '/') /* Handle old-format dictionaries too */ X p++; X } X return (0); X } X X newcount () X { X char buf[200]; X char lastbuf[200]; X FILE *d; X int i; X register char *cp; X X fprintf (stderr, "Counting words in dictionary ...\n"); X X if ((d = fopen (Dfile, "r")) == NULL) { X fprintf (stderr, "Can't open dictionary\n"); X exit (1); X } X X for (i = 0, lastbuf[0] = '\0'; fgets (buf, sizeof buf, d); ) { X for (cp = buf; *cp; cp++) { X if (mylower (*cp)) X *cp = toupper (*cp); X } X if (strcmp (buf, lastbuf) != 0) { X if ((++i & 1023) == 0) { X printf ("%d ", i); X fflush (stdout); X } X strcpy (lastbuf, buf); X } X } X fclose (d); X printf ("\n%d words\n", i); X if ((d = fopen (Cfile, "w")) == NULL) { X fprintf (stderr, "can't create %s\n", Cfile); X exit (1); X } X fprintf (d, "%d\n", i); X fclose (d); X } SHAR_EOF fi # end of overwriting check echo shar: extracting "'config.X'" '(4594 characters)' if test -f 'config.X' then echo shar: will not over-write existing file "'config.X'" else sed 's/^X //' << \SHAR_EOF > 'config.X' X /* X * This is the configuration file for ispell. Thanks to Bob McQueer X * for creating it and making the necessary changes elsewhere to X * support it. X * Look through this file from top to bottom, and edit anything that X * needs editing. There are also five or six variables in the X * Makefile that you must edit. Note that the Makefile edits this X * file (config.X) to produce config.h. If you are looking at X * config.h, you're in the wrong file. X * X * Don't change the funny-looking lines with !!'s in them; see the X * Makefile! X */ X X /* X ** library directory for hash table(s) / default hash table name X ** If you intend to use multiple dictionary files, I would suggest X ** LIBDIR be a directory which will contain nothing else, so sensible X ** names can be constructed for the -d option without conflict. X */ X #ifndef LIBDIR X #define LIBDIR "!!LIBDIR!!" X #endif X #ifndef DEFHASH X #define DEFHASH "!!DEFHASH!!" X #endif X X #ifdef USG X #define index strchr X #define rindex strchr X #endif X X /* environment variable for user's word list */ X #ifndef PDICTVAR X #define PDICTVAR "WORDLIST" X #endif X X /* default word list */ X #ifndef DEFPDICT X #define DEFPDICT ".ispell_words" X #endif X X /* environment variable for include file string */ X #ifndef INCSTRVAR X #define INCSTRVAR "INCLUDE_STRING" X #endif X X /* default include string */ X #ifndef DEFINCSTR X #define DEFINCSTR "&Include_File&" X #endif X X /* mktemp template for temporary file - MUST contain 6 consecutive X's */ X #ifndef TEMPNAME X #define TEMPNAME "/tmp/ispellXXXXXX" X #endif X X /* default dictionary file */ X #ifndef DEFDICT X #define DEFDICT "!!DEFDICT!!" X #endif X X /* path to LOOK (if look(1) command is available) */ X #ifndef LOOK X #undef LOOK X #endif X X /* path to egrep (use speeded up version if available) */ X #ifndef EGREPCMD X #define EGREPCMD "/bin/egrep" X #endif X X /* path to wordlist for Lookup command (typically /usr/dict/{words|web2} */ X #ifndef WORDS X #define WORDS "/usr/dict/words" X #endif X X /* buffer size to use for file names if not in sys/param.h */ X #ifndef MAXPATHLEN X #define MAXPATHLEN 240 X #endif X X /* word length allowed in dictionary by buildhash */ X #define WORDLEN 30 X X /* suppress the 8-bit character feature */ X #ifndef NO8BIT X #define NO8BIT X #endif X X /* maximum number of include files supported by xgets; set to 0 to disable */ X #ifndef MAXINCLUDEFILES X #define MAXINCLUDEFILES 5 X #endif X X /* Approximate number of words in the full dictionary, after munching. X ** Err on the high side unless you are very short on memory, in which X ** case you might want to change the tables in tree.c and also increase X ** MAXPCT. X ** X ** (Note: dict.191 is a bit over 15000 words. dict.191 munched with X ** /usr/dict/words is a little over 28000). X */ X #ifndef BIG_DICT X #define BIG_DICT 29000 X #endif X X /* X ** Maximum hash table fullness percentage. Larger numbers trade space X ** for time. X **/ X #ifndef MAXPCT X #define MAXPCT 70 /* Expand table when 70% full */ X #endif X X /* X ** the isXXXX macros normally only check ASCII range. These are used X ** instead for text characters, which we assume may be 8 bit. The X ** NO8BIT ifdef shuts off significance of 8 bit characters. If you are X ** using this, and your ctype.h already masks, you can simplify. X */ X #ifdef NO8BIT X #define myupper(X) isupper((X)&0x7f) X #define mylower(X) islower((X)&0x7f) X #define myspace(X) isspace((X)&0x7f) X #define myalpha(X) isalpha((X)&0x7f) X #else X #define myupper(X) (!((X)&0x80) && isupper(X)) X #define mylower(X) (!((X)&0x80) && islower(X)) X #define myspace(X) (!((X)&0x80) && isspace(X)) X #define myalpha(X) (!((X)&0x80) && isalpha(X)) X #endif X X /* X ** the NOPARITY mask is applied to user input characters from the terminal X ** in order to mask out the parity bit. X */ X #ifdef NO8BIT X #define NOPARITY 0x7f X #else X #define NOPARITY 0xff X #endif X X X /* X ** the terminal mode for ispell, set to CBREAK or RAW X ** X */ X #ifndef TERM_MODE X #define TERM_MODE CBREAK X #endif X X /* X ** Define this if you want your columns of words to be of equal length. X ** This will spread short word lists across the screen instead of down it. X */ X #ifndef EQUAL_COLUMNS X #undef EQUAL_COLUMNS X #endif X X /* X ** This is the extension that will be added to backup files X */ X #ifndef BAKEXT X #define BAKEXT ".bak" X #endif X X /* X ** Define this if you want the capitalization feature. This will increase X ** the size of the hashed dictionary on most 16-bit and some 32-bit machines. X */ X #ifndef CAPITALIZE X #define CAPITALIZE X #endif X X /* X ** Define this if you want your personal dictionary sorted. This may take X ** a long time for very large dictionaries. Dictionaries larger than X ** SORTPERSONAL words will not be sorted. X */ X #ifndef SORTPERSONAL X #define SORTPERSONAL 1000 X #endif SHAR_EOF fi # end of overwriting check echo shar: extracting "'fixdict.sh'" '(2502 characters)' if test -f 'fixdict.sh' then echo shar: will not over-write existing file "'fixdict.sh'" else sed 's/^X //' << \SHAR_EOF > 'fixdict.sh' X : Use /bin/sh X # X # Add capitalization information to an ispell dictionary X # X # Usage: X # X # fixdict dict-file X # X # Requires availability of UNIX spell. The new dictionary is X # rewritten in place. A list of words that couldn't be X # resolved (because spell doesn't know them) is written to X # standard output. This list appears in lowercase in the X # dictionary, and if there are any errors the must be edited X # by hand. X # X # The final dictionary appears in expanded form and must be X # passed through munchlist to regenerate suffixes. X # X LIBDIR=/tmp2/lib X EXPAND1=${LIBDIR}/isexp1.sed X EXPAND2=${LIBDIR}/isexp2.sed X EXPAND3=${LIBDIR}/isexp3.sed X EXPAND4=${LIBDIR}/isexp4.sed X TDIR=${TMPDIR:-/tmp} X TMP=${TDIR}/fix$$ X X trap "/bin/rm -f ${TMP}*; exit 1" 1 2 15 X sed -f ${EXPAND1} $1 | sed -f ${EXPAND2} \ X | sed -f ${EXPAND3} | sed -f ${EXPAND4} \ X | tr '[A-Z]' '[a-z]' \ X | spell \ X | sort > ${TMP}a X # X # ${TMP}a contains all the words that spell doesn't like. X # Now figure out which of those are because spell doesn't know them at X # all, and leave those in ${TMP}b. X # X tr '[a-z]' '[A-Z]' < ${TMP}a | spell | tr '[A-Z]' '[a-z]' > ${TMP}b X # X # The wrongly-capitalized words are those that spell didn't object to X # in the last step. Produce a list of them in, and capitalize the X # first letter of each. Save this list in ${TMP}c. X # X comm -23 ${TMP}a ${TMP}b \ X | sed 's/^a/A/;s/^b/B/;s/^c/C/;s/^d/D/;s/^e/E/;s/^f/F/;s/^g/G/;s/^h/H/ X s/^i/I/;s/^j/J/;s/^k/K/;s/^l/L/;s/^m/M/;s/^n/N/;s/^o/O/;s/^p/P/ X s/^q/Q/;s/^r/R/;s/^s/S/;s/^t/T/;s/^u/U/;s/^v/V/;s/^w/W/;s/^x/X/ X s/^y/Y/;s/^z/Z/' > ${TMP}c X # X # Find out which of those spell objects to, saving the failures in ${TMP}d. X # X spell ${TMP}c > ${TMP}d X # X # Extract the words which were correctly capitalized at the first letter, X # combine them with an all-capitals version of the ones that weren't, and X # put the result into ${TMP}e. X # X (comm -23 ${TMP}c ${TMP}d; tr '[a-z]' '[A-Z]' < ${TMP}d) \ X | sort -o ${TMP}e X # X # At this point, ${TMP}b contains the words that spell just plain doesn't X # like, and ${TMP}e contains the words that are now capitalized correctly. X # X /bin/rm ${TMP}[cd] X # X # Put it all together, rewriting the dictionary in place. X # X sed -f ${EXPAND1} $1 | sed -f ${EXPAND2} \ X | sed -f ${EXPAND3} | sed -f ${EXPAND4} \ X | tr '[A-Z]' '[a-z]' \ X | sort \ X | comm -23 - ${TMP}a \ X | sort -f -o $1 - ${TMP}b ${TMP}e X # X # Finally, write the list of words that have questionable capitalization X # to the standard output. X # X cat ${TMP}b X /bin/rm ${TMP}* SHAR_EOF chmod +x 'fixdict.sh' fi # end of overwriting check # End of shell archive exit 0 -- Brandon S. Allbery {decvax,cbatt,cbosgd}!cwruecmp!ncoast!allbery Tridelta Industries {ames,mit-eddie,talcott}!necntc!ncoast!allbery 7350 Corporate Blvd. necntc!ncoast!allbery@harvard.HARVARD.EDU Mentor, OH 44060 +01 216 255 1080 (also eddie.MIT.EDU)