allbery@ncoast.UUCP (05/31/87)
[Since this is a beta release, it is here. The final version will show up in
comp.sources.unix with the other good stuff. ++bsa]
#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create the files:
# UPDATE2
# Makefile
# Othercap.txt
# README
# UPDATE
# WISHES
# buildhash.c
# config.X
# fixdict.sh
# This archive created: Sat May 30 17:13:25 1987
export PATH; PATH=/bin:$PATH
echo shar: extracting "'UPDATE2'" '(6964 characters)'
if test -f 'UPDATE2'
then
echo shar: will not over-write existing file "'UPDATE2'"
else
sed 's/^X //' << \SHAR_EOF > 'UPDATE2'
X This is the beta-test release of ispell version 2.0. As I discussed in
X a previous comp.sources.d posting, I will collect bug fixes for this version,
X and then post a final version with dictionary to mod.sources, at which time
X I will wash my hands of the bloody thing.
X
X Because I am short on time, I can only promise to integrate bug fixes. If
X you send me improvements, they will very likely disappear into a black hole.
X Sorry, but it takes time to integrate every change, even the ones I can't test
X because they're for BSD. If you plan on hacking extensively, I'd suggest
X waiting for the mod.sources posting, or you may have to repeat some work.
X
X Send bug reports/fixes to:
X
X Geoff Kuenning geoff@ITcorp.com {uunet,trwrb}!desint!geoff
X ---------------------------------------------------------------------------
X INSTRUCTIONS:
X
X In response to many requests, this posting contains all sources except the
X dictionary. Since shar won't overwrite files and some file names have
X changed, you should unshar it in an empty directory. If you installed my
X previous posting, you may also want to remove expand[12].sed from
X /usr/public/lib, since these scripts have been renamed to to isexp[1-4].sed.
X
X Once you have unpacked, edit "Makefile" and "config.X" according to the
X comments in each. Note that the Makefile edits "config.X" further to
X produce "config.h". Then type "make install" and go away for a while
X (if you're brave and foolish; otherwise do the equivalent more carefully).
X
X If you don't already have a dictionary, please don't ask me for one. Ask
X a neighbor. If they don't have one, and you can't make one from
X /usr/dict/words or /usr/dict/web2 by running it through "munchlist",
X try running a bunch of text files through "makedict.sh". (It depends on
X UNIX spell, though in a pinch you can do without if your source files are
X very good). If all else fails, you'll just have to wait for the mod.sources
X posting.
X
X If you do have a dictionary, and you would like to use the new CAPITALIZE
X feature, you will have to convert you dictionary. If you have UNIX spell,
X the "fixdict.sh" script will do this for you, without violating any
X copyrights or license restrictions. This script replaces the current
X dictionary, and writes a (short) list of questionable capitalizations
X to standard output; these must be analyzed and, if necessary, corrected
X by hand. The file "Othercap.txt" (included in this posting) contains
X words that are in dict.191 which will be missed by "fixdict.sh" with
X the standard UNIX spell program.
X
X Problems fixed in this posting:
X
X (1) Ispell did not duplicate the permissions on the files it edited.
X (David Neves)
X (2) The actual maximum number of possible corrections was 99, not 100.
X (3) Ispell assumed a terminal width of 80 columns, rather than
X consulting the termcap entry.
X (4) Long lines could wrap around on the terminal, damaging the
X display.
X (5) The includes of types.h and param.h need to be interchanged on
X BSD systems. (Ken Yap, Jacob Gore)
X (6) The givehelp() routine now actually waits for a space to be typed
X like it claims, instead of just waiting for any character. (Steve
X Kelem)
X (7) Good.c was missing a declaration of the index() (strchr) routine.
X (8) The excessive strlen() calls in good.c have been removed, and
X register declarations have been added. (Joe Orost, Rich Salz)
X (9) Some systems get "multiple symbol definition" messages when
X linking (Joe Orost).
X (10) Expand[12].sed didn't handle new-format dictionaries.
X (11) Some minor errors in the usage message have been corrected.
X (12) If a space (or other non-word character) is inserted using "R",
X ispell would treat the entire replacement string as a token
X and try to find it in the dictionary.
X (13) Ispell now follows the proper UNIX procedure for signal catching
X (i.e., it doesn't catch SIGINT if it's run in background).
X (14) The handling of process stopping on BSD systems has been cleaned
X up and made to work right (Mark Davies).
X
X Improvements added in this posting:
X
X (1) Ispell's handling of troff size and font requests has again been
X improved. (Isaac Balbin, Steve Kelem, Joe Orost) (Everybody seems
X to fix the particular problem that bothers their individual world :-).
X (2) If ispell is run on a file with an extension of ".tex", it will
X automatically go into TeX mode for this and subsequent files.
X (Steve Kelem)
X (3) The emacs support now includes "ispell-buffer", and ispell is run
X from "ispell-program-name" so you can specify an explicit path.
X (Stewart Clamen)
X (4) There is a TERM_MODE configuration option so you can choose between RAW
X and CBREAK modes. The default has been changed to CBREAK (it used
X to be RAW) to preserve parity. (Joe Orost)
X (5) Term.c will now compile on V7 systems (Joe Orost)
X (6) Register declarations have been added throughout. (Joe Orost)
X (7) Ispell now buffers stdout, improving display performance slightly.
X (8) The backup file extension is now configurable (George Sipe).
X (9) All config.X definitions except MAGIC can be overridden with -D
X switches (George Sipe).
X (10) There is now a version.h file, so you will know what version you
X have (I guess Larry Wall deserves credit. Even though he didn't
X harass me, guilt set in). There is also a -v switch to print the
X version information.
X (11) (This was a lot tougher that I expected). Ispell now knows about
X capitalization and proper names (yay). It recognizes four flavors
X of words: lowercase, capitalized, all-capitals, and "followcase".
X If a word appears in the dictionary in lowercase, it is accepted
X in lowercase, capitalized, or all-capitals. If it is capitalized
X in the dictionary, all-lowercase is disallowed. If it is all-caps
X in the dictionary, it must always appear in all caps. Finally,
X if the word has "weird" capitalization (like the name of my company,
X ITcorp or ITCorp), either that capitalization must be followed
X *exactly* or else the word must appear in all-caps. More than
X one of these variants may occur; "munchlist" will remove unneeded
X ones from a dictionary. Finally, if you blow capitalization,
X ispell will offer a list of correctly-capitalized alternatives.
X Because it increases the size of the hash file, this feature is
X optional (see the CAPITALIZE option in config.X).
X (12) A new shell script ("fixdict.sh") is provided to aid in converting
X your old dictionary to provide capitalization information.
X (13) Buildhash now pads the string table to a "struct dent" boundary
X in the hash file, so that it will be aligned when reading in. On
X many machines, this will speed startup.
X (14) The -w option now accepts characters specified in octal with
X backslashes like any other UNIX program, as well as the previous
X decimal option, and it will also accept numeric strings of less
X than three digits.
X (15) The ispell.el file now supports ispell-region and ispell-buffer.
SHAR_EOF
fi # end of overwriting check
echo shar: extracting "'Makefile'" '(2252 characters)'
if test -f 'Makefile'
then
echo shar: will not over-write existing file "'Makefile'"
else
sed 's/^X //' << \SHAR_EOF > 'Makefile'
X # -*- Mode: Text -*-
X
X # Look over config.X before building.
X #
X # You may want to edit BINDIR, LIBDIR, DEFHASH, DEFDICT, MAN1DIR, MAN4DIR
X # MAN1EXT, MAN4EXT, and TERMLIB below;
X # the Makefile will update all other files to match.
X #
X # On USG systems, add -DUSG to CFLAGS.
X #
X # The ifdef NO8BIT may be used if 8 bit extended text characters
X # cause problems, or you simply don't wish to allow the feature.
X #
X # the argument syntax for buildhash to make alternate dictionary files
X # is simply:
X #
X # buildhash <infile> <outfile>
X
X CC = lcc -v -HL -HD -R tgetflag
X CFLAGS = -n -O -DUSG
X # BINDIR, LIBDIR, DEFHASH, DEFDICT, MAN1DIR, MAN4DIR, MAN1EXT, MAN4EXT,
X # TERMLIB
X BINDIR = /usr/sahbin
X LIBDIR = /tmp2/lib
X DEFHASH = ispell.hash
X DEFDICT = dict.191
X MAN1DIR = /usr/man/u_man/man1
X MAN4DIR = /usr/man/u_man/man4
X MAN1EXT = .1l
X MAN4EXT = .4l
X # TERMLIB = -lcurses
X TERMLIB = -ltermcap
X
X SHELL = /bin/sh
X
X all: buildhash ispell icombine munchlist isexpand $(DEFHASH)
X
X ispell.hash: buildhash $(DEFDICT)
X ./buildhash $(DEFDICT) $(DEFHASH)
X
X install: all
X cp ispell isexpand munchlist $(BINDIR)
X cp ispell.hash $(LIBDIR)/$(DEFHASH)
X cp expand1.sed expand2.sed icombine $(LIBDIR)
X chmod 755 $(BINDIR)/ispell $(BINDIR)/munchlist $(BINDIR)/isexpand \
X $(LIBDIR)/icombine
X chmod 644 $(LIBDIR)/$(DEFHASH) $(LIBDIR)/expand1.sed \
X $(LIBDIR)/expand2.sed
X cp ispell.1 $(MAN1DIR)/ispell$(MAN1EXT)
X cp ispell.4 $(MAN4DIR)/ispell$(MAN4EXT)
X
X buildhash: buildhash.o hash.o
X $(CC) $(CFLAGS) -o buildhash buildhash.o hash.o
X
X icombine: icombine.c config.h ispell.h
X $(CC) $(CFLAGS) -o icombine icombine.c
X
X munchlist: munchlist.X Makefile
X sed -e 's@!!LIBDIR!!@$(LIBDIR)@' -e 's@!!DEFDICT!!@$(DEFDICT)@' \
X <munchlist.X >munchlist
X chmod +x munchlist
X
X isexpand: isexpand.X Makefile
X sed -e 's@!!LIBDIR!!@$(LIBDIR)@' isexpand.X >isexpand
X chmod +x isexpand
X
X OBJS=ispell.o term.o good.o lookup.o hash.o tree.o xgets.o
X
X ispell: $(OBJS)
X cc $(CFLAGS) -o ispell $(OBJS) $(TERMLIB)
X
X $(OBJS) buildhash.o: config.h ispell.h
X ispell.o: version.h
X
X config.h: config.X Makefile
X sed -e 's@!!LIBDIR!!@$(LIBDIR)@' -e 's@!!DEFDICT!!@$(DEFDICT)@' \
X -e 's@!!DEFHASH!!@$(DEFHASH)@' <config.X >config.h
X
X clean:
X rm -f *.o buildhash ispell core a.out mon.out hash.out \
X *.stat *.cnt munchlist config.h
SHAR_EOF
fi # end of overwriting check
echo shar: extracting "'Othercap.txt'" '(106 characters)'
if test -f 'Othercap.txt'
then
echo shar: will not over-write existing file "'Othercap.txt'"
else
sed 's/^X //' << \SHAR_EOF > 'Othercap.txt'
X Airedale
X Alcibiades
X Argo
X Argos
X Arianist
X Arianists
X Auckland
X CDR
X Ethernet
X Ethernet's
X Ethernets
X MIT's
X Sikkim
SHAR_EOF
fi # end of overwriting check
echo shar: extracting "'README'" '(6256 characters)'
if test -f 'README'
then
echo shar: will not over-write existing file "'README'"
else
sed 's/^X //' << \SHAR_EOF > 'README'
X -*- Mode:Text -*-
X
X Ispell consists of two programs: the actual spelling checker, "ispell",
X and the hash table builder, "buildhash". Everything is set up so you
X can just say "make install" and have everything happen. You might want
X to edit the makefile, and ispell.h to change the destination of the
X program and the hash table.
X
X The dictionary comes from the ITS spell dictionary. I got it from
X "ml:wba;dict 191", although I don't know that this is the copy currenty
X in use on the 20's around MIT.
X
X ----------------------------------------------------------------------
X
X Addendum:
X
X My eternal gratitude to the author of ispell -- I don't know how I
X ever lived without it. I received his permission to post ispell to
X the net and have added a GNU EMACS interface. Look in the file
X ispell.el for installation instructions.
X
X As far as I know, no one informally "supports" this program. If you
X would like to "adopt" it (collect fixes/enhancements and post a new
X version periodically), feel free to do so.
X
X I volunteer to collect dictionary diffs and post a composite diff
X periodically. If you add a lot of words to the main dictionary, send
X me the diffs between the the modified dictionary and the posted one.
X Also, if you have access to a TOPS20 system with a more complete
X dictionary in ispell format, send me the diffs if you can. Just
X PLEASE don't dump an entire dictionary to our site!
X
X The dictionary posted is one I snarfed from around here -- after
X comparison with the one originally supplied, ours appears a tad more
X complete and accurate.
X
X Walt Buehring
X Texas Instruments - Computer Science Center
X
X ARPA: Buehring%TI-CSL@CSNet-Relay
X UUCP: {smu, texsun, im4u, rice} ! ti-csl ! buehring
X
X ----------------------------------------------------------------------
X
X The following is the only documentation I could find about the format
X of the dictionary. It was written for the TOPS20 speller that ispell
X mimics, so I believe most the information is applicable. It should be
X useful if you want to add words to the main dictionary by hand. -WB
X
X ----------------------------------------------------------------------
X
X 11.6 Dictionary flags
X
X Words in SPELL's main dictionary (but not the other dictionaries) may
X have flags associated with them to indicate the legality of suffixes
X without the need to keep the full suffixed words in the dictionary. The
X flags have "names" consisting of single letters. Their meaning is as
X follows:
X
X Let # and @ be "variables" that can stand for any letter. Upper case
X letters are constants. "..." stands for any string of zero or more
X letters, but note that no word may exist in the dictionary which is not at
X least 2 letters long, so, for example, FLY may not be produced by placing
X the "Y" flag on "F". Also, no flag is effective unless the word that it
X creates is at least 4 letters long, so, for example, WED may not be
X produced by placing the "D" flag on "WE".
X
X "V" flag:
X ...E --> ...IVE as in CREATE --> CREATIVE
X if # .ne. E, ...# --> ...#IVE as in PREVENT --> PREVENTIVE
X
X "N" flag:
X ...E --> ...ION as in CREATE --> CREATION
X ...Y --> ...ICATION as in MULTIPLY --> MULTIPLICATION
X if # .ne. E or Y, ...# --> ...#EN as in FALL --> FALLEN
X
X "X" flag:
X ...E --> ...IONS as in CREATE --> CREATIONS
X ...Y --> ...ICATIONS as in MULTIPLY --> MULTIPLICATIONS
X if # .ne. E or Y, ...# --> ...#ENS as in WEAK --> WEAKENS
X
X "H" flag:
X ...Y --> ...IETH as in TWENTY --> TWENTIETH
X if # .ne. Y, ...# --> ...#TH as in HUNDRED --> HUNDREDTH
X
X "Y" FLAG:
X ... --> ...LY as in QUICK --> QUICKLY
X
X "G" FLAG:
X ...E --> ...ING as in FILE --> FILING
X if # .ne. E, ...# --> ...#ING as in CROSS --> CROSSING
X
X "J" FLAG"
X ...E --> ...INGS as in FILE --> FILINGS
X if # .ne. E, ...# --> ...#INGS as in CROSS --> CROSSINGS
X
X "D" FLAG:
X ...E --> ...ED as in CREATE --> CREATED
X if @ .ne. A, E, I, O, or U,
X ...@Y --> ...@IED as in IMPLY --> IMPLIED
X if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
X ...@# --> ...@#ED as in CROSS --> CROSSED
X or CONVEY --> CONVEYED
X "T" FLAG:
X ...E --> ...EST as in LATE --> LATEST
X if @ .ne. A, E, I, O, or U,
X ...@Y --> ...@IEST as in DIRTY --> DIRTIEST
X if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
X ...@# --> ...@#EST as in SMALL --> SMALLEST
X or GRAY --> GRAYEST
X
X "R" FLAG:
X ...E --> ...ER as in SKATE --> SKATER
X if @ .ne. A, E, I, O, or U,
X ...@Y --> ...@IER as in MULTIPLY --> MULTIPLIER
X if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
X ...@# --> ...@#ER as in BUILD --> BUILDER
X or CONVEY --> CONVEYER
X
X "Z FLAG:
X ...E --> ...ERS as in SKATE --> SKATERS
X if @ .ne. A, E, I, O, or U,
X ...@Y --> ...@IERS as in MULTIPLY --> MULTIPLIERS
X if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
X ...@# --> ...@#ERS as in BUILD --> BUILDERS
X or SLAY --> SLAYERS
X
X "S" FLAG:
X if @ .ne. A, E, I, O, or U,
X ...@Y --> ...@IES as in IMPLY --> IMPLIES
X if # .eq. S, X, Z, or H,
X ...# --> ...#ES as in FIX --> FIXES
X if # .ne. S, X, Z, H, or Y, or (# = Y and @ = A, E, I, O, or U)
X ...@# --> ...@#S as in BAT --> BATS
X or CONVEY --> CONVEYS
X
X "P" FLAG:
X if @ .ne. A, E, I, O, or U,
X ...@Y --> ...@INESS as in CLOUDY --> CLOUDINESS
X if # .ne. Y, or @ = A, E, I, O, or U,
X ...@# --> ...@#NESS as in LATE --> LATENESS
X or GRAY --> GRAYNESS
X
X "M" FLAG:
X ... --> ...'S as in DOG --> DOG'S
X
X ----------------------------------------------------------------------
X
X [Whew! That's all very nice, but how about a quick reference... -WB]
X
X V - ive
X N - ion, tion, en
X X - ions, ications, ens
X H - th, ieth
X Y - ly
X G - ing
X J - ings
X D - ed
X T - est
X R - er
X Z - ers
X S - s, es, ies
X P - ness, iness
X M - 's
SHAR_EOF
fi # end of overwriting check
echo shar: extracting "'UPDATE'" '(5252 characters)'
if test -f 'UPDATE'
then
echo shar: will not over-write existing file "'UPDATE'"
else
sed 's/^X //' << \SHAR_EOF > 'UPDATE'
X Ispell enhancements - 3/13/87
X
X (See three companion postings in net.sources.bugs).
X
X Here are the enhancements to ispell that I mentioned a couple of days ago.
X Because of the number of changes, several of the context diff's are bigger
X than the original files. In addition, many people have gotten confused
X about versions, since enhancements/fixes have been made by six different
X people, counting myself (for the list, see the end of ispell.man). I
X have integrated all of these fixes and enhancements in one place.
X
X For these reasons, I have decided to repost all of the sources for ispell,
X with one exception -- the dictionary. (A couple of small files, such
X as ispell.el, are unchanged, but I decided to repost them any for
X completeness. If you didn't have ispell before, you now need only the
X dictionary).
X
X The dictionary is a special case: if you think about it, even ordinary
X diff's will always work with "patch" on that each-line-is-unique file.
X An out-of-place insertion can be corrected by sorting the dictionary
X after patching (something that is done anyway as a side effect of the
X new "munchlist" script). Because of this, I have decided not to repost
X the sizable dictionary. In the process of testing this code, it occurred
X to me to run dict.191 through UNIX "spell"; the results of that are
X given in three companion postings in net.sources.bugs, which seemed
X like a more appropriate place for the diffs. (The postings are not
X divided because of their size; see comments in the postings for my
X reasons).
X
X Now, here's what I've done:
X
X In ispell itself:
X
X - The personal dictionary is now hashed, just like the main one, and
X supports suffixes just like the main one. (It's not actually
X integrated with the main one, because expanding the main one
X is inefficient and poses a minor but troublesome technical
X problem). A personal dictionary of 28000+ words can be read in
X within a few minutes (hey, nobody's perfect -- whatcha doing
X with such a big dictionary anyway? :-).
X - New option "-c" is used by the new munchlist script to generate
X suggested root/suffix combinations.
X - The -d option can now specify /dev/null, if you want to use
X only your personal dictionary (this also saves startup time
X with -c, and is used by the "munchlist" script, which is why
X I put it in).
X - The -p option is now more flexible about its handling of pathnames.
X An absolute pathname is always interpreted literally. A
X relative pathname from WORDLIST is looked up in $HOME first,
X then in the current directory. The -p option behaves in the
X reverse fashion: current directory first, then $HOME. This
X behavior seems more intuitive to me; I'd be interested in
X opinions of others if you don't find it intuitive.
X - Perhaps most important, I have completely overhauled the logic
X in good.c, so that it (I think) matches what the README file
X says it should, no more, no less. The code has been extensively
X tested, notably by interaction with the new expansion scripts;
X nevertheless because of the extent of the changes and the
X nature of the logic, I'd suggest a bit of suspicion for a while.
X A technique we've found useful here is to do your normal work
X with ispell, and then do a final check with UNIX spell or some
X other slow, inconvenient program to make sure ispell didn't
X screw up.
X
X New scripts:
X
X - expand.awk: an obsolete (but correct) awk script that does
X the same thing as expand[12].sed, except slower. The awk
X script is also much easier to understand than the sed scripts.
X Superseded by the sed scripts, except for very short input.
X - expand[12].sed: the sed pipe
X
X "sed -f expand1.sed $file | sed -f expand2.sed"
X
X where "$file" is a raw dictionary file with suffixes
X (e.g., dict.191), generates a list of each root alone, plus
X the root expanded with each possible suffix (e.g.,
X "BOTH/R/Z" produces "BOTH", "BOTHER", and "BOTHERS"). The
X output should usually be sorted with the -u switch before
X further processing. These scripts are used by 'munchlist';
X they are also useful for (a) checking an ispell dictionary
X with some other spell-checking program and (b) figuring
X out what a particular suffix does to a certain word without
X reading the README file.
X - munchlist.sh: a slow, but effective, shell script that takes
X lists of expanded or unexpanded words as input and reduces
X them to a (usually smaller) list of roots and suffixes. The
X result is written to standard output. I think the documentation
X forgot to mention the input must be one word per line. I
X have successfully used this script to combine dict.191 with
X /usr/dict/words; it's also useful (and a lot faster) on
X private dictionaries. For private dictionaries. it will also
X remove any word that has since been added to the main dictionary.
X
X Oh yes, I almost forgot: the original documentation didn't mention
X that ispell is a long-name program. If your "File:" display on the
X top line actually contains the misspelled word, you have long-name problems.
X My fixes don't address long names, because I finally have a way to
X compile long-name programs, thanks to "hash8".
X
X Geoff Kuenning
X geoff@ITcorp.COM
X ...!trwrb!desint!geoff
SHAR_EOF
fi # end of overwriting check
echo shar: extracting "'WISHES'" '(2317 characters)'
if test -f 'WISHES'
then
echo shar: will not over-write existing file "'WISHES'"
else
sed 's/^X //' << \SHAR_EOF > 'WISHES'
X Things remaining to be done to ispell:
X
X - The "munchlist" script can actually increase the size of a
X dictionary. For example, munching dict.191 (after my bug fixes
X to it) reduced the number of words by about 40, but increased
X the number of characters by a small percentage. This is
X because munchlist doesn't recognize duplicate suffixes that
X generate the same result, except for the three special
X "s-ending" suffixes J, Z, and X. For example, right now
X munchlist will make BATHER by adding the R suffix to both
X BATH and BATHE. It would be nice if munchlist could recognize
X the redundancy and reduce its output so that each word was made
X in only one way.
X - The characters in the -w option should be written to the header
X of the hash file, and to a header in the personal dictionary,
X so the user doesn't have to remember to specify them every time.
X - Buildhash should support the -w option.
X - Buildhash, munchlist, icombine, and the expand scripts should use
X a character other than slash for the flag separator, so that slashes
X are available to the -w option. I tend to lean towards commas.
X - It might be nice to support multiple personal dictionaries. On
X the other hand, it's pretty easy to combine them with "cat".
X - Good.c should be table-driven, so that it is easier to modify for
X other languages. Ideally, it would support prefixes as well.
X - A small amount of string space could be saved if buildhash would
X combine strings with common suffixes (e.g., "and" could be stored
X as a pointer to the tail of "bland").
X - (Peter Wan) Ispell should have a "server mode" for large sites, to
X get rid of the time needed to read in the dictionary. On System V,
X this could be accomplished by having the first execution of ispell
X read the dictionary into a shared-memory region. Later incarnations
X would then get the dictionary by just attaching to the region.
X One problem would be that the dictionary gets modified during
X the run, so you might still have to do a memory-to-memory copy
X after the attach. The size of having two copies of the dictionary
X might prohibit this on many machines. Another approach is a
X message-based "good.c server", but this too would have to deal
X with the possibility of modifiying the dictionary.
SHAR_EOF
fi # end of overwriting check
echo shar: extracting "'buildhash.c'" '(10320 characters)'
if test -f 'buildhash.c'
then
echo shar: will not over-write existing file "'buildhash.c'"
else
sed 's/^X //' << \SHAR_EOF > 'buildhash.c'
X /* -*- Mode: Text -*- */
X
X #define MAIN
X
X /*
X * buildhash.c - make a hash table for ispell
X *
X * Pace Willisson, 1983
X */
X
X #include <ctype.h>
X #include <stdio.h>
X #ifdef USG
X #include <sys/types.h>
X #endif
X #include <sys/param.h>
X #include <sys/stat.h>
X #include "config.h"
X #include "ispell.h"
X
X #define NSTAT 100
X struct stat dstat, cstat;
X
X int numwords, hashsize;
X
X char *malloc();
X char *realloc ();
X
X struct dent *hashtbl;
X
X char *Dfile;
X char *Hfile;
X
X char Cfile[MAXPATHLEN];
X char Sfile[MAXPATHLEN];
X
X main (argc,argv)
X int argc;
X char **argv;
X {
X FILE *countf;
X FILE *statf;
X int stats[NSTAT];
X int i;
X
X if (argc > 1) {
X ++argv;
X Dfile = *argv;
X if (argc > 2) {
X ++argv;
X Hfile = *argv;
X }
X else
X Hfile = DEFHASH;
X }
X else {
X Dfile = DEFDICT;
X Hfile = DEFHASH;
X }
X
X sprintf(Cfile,"%s.cnt",Dfile);
X sprintf(Sfile,"%s.stat",Dfile);
X
X if (stat (Dfile, &dstat) < 0) {
X fprintf (stderr, "No dictionary (%s)\n", Dfile);
X exit (1);
X }
X
X if (stat (Cfile, &cstat) < 0 || dstat.st_mtime > cstat.st_mtime)
X newcount ();
X
X if ((countf = fopen (Cfile, "r")) == NULL) {
X fprintf (stderr, "No count file\n");
X exit (1);
X }
X numwords = 0;
X fscanf (countf, "%d", &numwords);
X fclose (countf);
X if (numwords == 0) {
X fprintf (stderr, "Bad count file\n");
X exit (1);
X }
X hashsize = numwords;
X readdict ();
X
X if ((statf = fopen (Sfile, "w")) == NULL) {
X fprintf (stderr, "Can't create %s\n", Sfile);
X exit (1);
X }
X
X for (i = 0; i < NSTAT; i++)
X stats[i] = 0;
X for (i = 0; i < hashsize; i++) {
X struct dent *dp;
X int j;
X if (hashtbl[i].used == 0) {
X stats[0]++;
X } else {
X for (j = 1, dp = &hashtbl[i]; dp->next != NULL; j++, dp = dp->next)
X ;
X if (j >= NSTAT)
X j = NSTAT - 1;
X stats[j]++;
X }
X }
X for (i = 0; i < NSTAT; i++)
X fprintf (statf, "%d: %d\n", i, stats[i]);
X fclose (statf);
X
X filltable ();
X
X output ();
X exit(0);
X }
X
X output ()
X {
X FILE *outfile;
X struct hashheader hashheader;
X int strptr, n, i;
X
X if ((outfile = fopen (Hfile, "w")) == NULL) {
X fprintf (stderr, "can't create %s\n",Hfile);
X return;
X }
X hashheader.magic = MAGIC;
X hashheader.stringsize = 0;
X hashheader.tblsize = hashsize;
X fwrite (&hashheader, sizeof hashheader, 1, outfile);
X strptr = 0;
X for (i = 0; i < hashsize; i++) {
X n = strlen (hashtbl[i].word) + 1;
X #ifdef CAPITALIZE
X if (hashtbl[i].followcase)
X n += (hashtbl[i].word[n] & 0xFF) * (n + 1) + 1;
X #endif
X fwrite (hashtbl[i].word, n, 1, outfile);
X hashtbl[i].word = (char *)strptr;
X strptr += n;
X }
X /* Pad file to a struct dent boundary for efficiency. */
X n = (strptr + sizeof hashheader) % sizeof (struct dent);
X if (n != 0) {
X n = sizeof (struct dent) - n;
X strptr += n;
X while (--n >= 0)
X putc ('\0', outfile);
X }
X for (i = 0; i < hashsize; i++) {
X if (hashtbl[i].next != 0) {
X int x;
X x = hashtbl[i].next - hashtbl;
X hashtbl[i].next = (struct dent *)x;
X } else {
X hashtbl[i].next = (struct dent *)-1;
X }
X }
X fwrite (hashtbl, sizeof (struct dent), hashsize, outfile);
X hashheader.stringsize = strptr;
X rewind (outfile);
X fwrite (&hashheader, sizeof hashheader, 1, outfile);
X fclose (outfile);
X }
X
X filltable ()
X {
X struct dent *freepointer, *nextword, *dp;
X int i;
X
X for (freepointer = hashtbl; freepointer->used; freepointer++)
X ;
X for (nextword = hashtbl, i = numwords; i != 0; nextword++, i--) {
X if (nextword->used == 0) {
X continue;
X }
X if (nextword->next == NULL) {
X continue;
X }
X if (nextword->next >= hashtbl && nextword->next < hashtbl + hashsize) {
X continue;
X }
X dp = nextword;
X while (dp->next) {
X if (freepointer > hashtbl + hashsize) {
X fprintf (stderr, "table overflow\n");
X getchar ();
X break;
X }
X *freepointer = *(dp->next);
X dp->next = freepointer;
X dp = freepointer;
X
X while (freepointer->used)
X freepointer++;
X }
X }
X }
X
X
X readdict ()
X {
X struct dent d;
X register struct dent *dp;
X char lbuf[100];
X FILE *dictf;
X int i;
X int h;
X int len;
X register char *p;
X
X if ((dictf = fopen (Dfile, "r")) == NULL) {
X fprintf (stderr, "Can't open dictionary\n");
X exit (1);
X }
X
X hashtbl = (struct dent *) calloc (numwords, sizeof (struct dent));
X if (hashtbl == NULL) {
X fprintf (stderr, "couldn't allocate hash table\n");
X exit (1);
X }
X
X i = 0;
X while (fgets (lbuf, sizeof lbuf, dictf) != NULL) {
X if ((i & 1023) == 0) {
X printf ("%d ", i);
X fflush (stdout);
X }
X i++;
X
X p = &lbuf [ strlen (lbuf) - 1 ];
X if (*p == '\n')
X *p = 0;
X
X if (makedent (lbuf, &d) < 0)
X continue;
X
X len = strlen (lbuf);
X #ifdef CAPITALIZE
X if (d.followcase)
X d.word = malloc (2 * len + 4);
X else
X d.word = malloc (len + 1);
X #endif
X if (d.word == NULL) {
X fprintf (stderr, "couldn't allocate space for word %s\n", lbuf);
X exit (1);
X }
X strcpy (d.word, lbuf);
X #ifdef CAPITALIZE
X if (d.followcase) {
X p = d.word + len + 1;
X *p++ = 1; /* Count of capitalizations */
X *p++ = '-'; /* Don't keep in pers dict */
X strcpy (p, lbuf);
X
X }
X for (p = d.word; *p; p++) {
X if (mylower (*p))
X *p = toupper (*p);
X }
X #endif
X
X h = hash (d.word, len, hashsize);
X
X dp = &hashtbl[h];
X if (dp->used == 0) {
X *dp = d;
X } else {
X
X #ifdef CAPITALIZE
X while (dp != NULL && strcmp (dp->word, d.word) != 0)
X dp = dp->next;
X if (dp != NULL) {
X if (d.followcase
X || (dp->followcase && !d.allcaps
X && !d.capitalize)) {
X /* Add a specific capitalization */
X if (dp->followcase) {
X p = &dp->word[len + 1];
X (*p)++; /* Bump counter */
X dp->word = realloc (dp->word,
X ((*p & 0xFF) + 1) * (len + 2));
X if (dp->word == NULL) {
X fprintf (stderr,
X "couldn't allocate space for word %s\n",
X lbuf);
X exit (1);
X }
X p = &dp->word[len + 1];
X p += ((*p & 0xFF) - 1) * (len + 2) + 1;
X *p++ = '-';
X strcpy (p,
X d.followcase ? &d.word[len + 3] : lbuf);
X }
X else {
X /* d.followcase must be true */
X /* thus, d.capitalize and d.allcaps are */
X /* clear */
X free (dp->word);
X dp->word = d.word;
X dp->followcase = 1;
X dp->k_followcase = 1;
X /* Code later will clear dp->allcaps. */
X }
X }
X /* Combine two capitalizations. If d was */
X /* allcaps, dp remains unchanged */
X if (d.allcaps == 0) {
X /* dp is the entry that will be kept. If */
X /* dp is followcase, the capitalize flag */
X /* reflects whether capitalization "may" */
X /* occur. If not, it reflects whether it */
X /* "must" occur. */
X if (d.capitalize) { /* ie lbuf was cap'd */
X if (dp->followcase)
X dp->capitalize = 1; /* May */
X else if (dp->allcaps) /* ie not lcase */
X dp->capitalize = 1; /* Must */
X }
X else { /* lbuf was followc or all-lc */
X if (!dp->followcase)
X dp->capitalize == 0; /* May */
X }
X dp->k_capitalize == dp->capitalize;
X dp->allcaps = 0;
X dp->k_allcaps = 0;
X }
X }
X else {
X #endif
X dp = (struct dent *) malloc (sizeof (struct dent));
X if (dp == NULL) {
X fprintf (stderr,
X "couldn't allocate space for collision\n");
X exit (1);
X }
X *dp = d;
X dp->next = hashtbl[h].next;
X hashtbl[h].next = dp;
X }
X }
X }
X printf ("\n");
X }
X
X /*
X * fill in the flags in d, and put a null after the word in s
X */
X
X makedent (lbuf, d)
X char *lbuf;
X struct dent *d;
X {
X char *p, *index();
X
X d->next = NULL;
X d->used = 1;
X d->v_flag = 0;
X d->n_flag = 0;
X d->x_flag = 0;
X d->h_flag = 0;
X d->y_flag = 0;
X d->g_flag = 0;
X d->j_flag = 0;
X d->d_flag = 0;
X d->t_flag = 0;
X d->r_flag = 0;
X d->z_flag = 0;
X d->s_flag = 0;
X d->p_flag = 0;
X d->m_flag = 0;
X d->keep = 0;
X #ifdef CAPITALIZE
X d->allcaps = 0;
X d->capitalize = 0;
X d->followcase = 0;
X /*
X ** Figure out the capitalization rules from the capitalization of
X ** the sample entry. Only one of followcase, allcaps, and capitalize
X ** will be set. Combinations are generated by higher-level code.
X */
X for (p = lbuf; *p && *p != '/'; p++) {
X if (mylower (*p))
X break;
X }
X if (*p == '\0' || *p == '/')
X d->allcaps = 1;
X else {
X for ( ; *p && *p != '/'; p++) {
X if (myupper (*p))
X break;
X }
X if (*p == '\0' || *p == '/') {
X /*
X ** No uppercase letters follow the lowercase ones.
X ** If the first two letters are capitalized, it's
X ** "followcase". If the first one is capitalized, it's
X ** "capitalize".
X */
X if (myupper (lbuf[0])) {
X if (myupper (lbuf[1]))
X d->followcase = 1;
X else
X d->capitalize = 1;
X }
X }
X else
X d->followcase = 1; /* .../lower/upper */
X }
X d->k_allcaps = d->allcaps ;
X d->k_capitalize = d->capitalize;
X d->k_followcase = d->followcase;
X #endif
X
X p = index (lbuf, '/');
X if (p != NULL)
X *p = 0;
X if (strlen (lbuf) > WORDLEN - 1) {
X printf ("%s: word too big\n", lbuf);
X return (-1);
X }
X
X if (p == NULL)
X return (0);
X
X p++;
X while (*p != '\0' && *p != '\n') {
X if (mylower (*p))
X *p = toupper (*p);
X switch (*p) {
X case 'V': d->v_flag = 1; break;
X case 'N': d->n_flag = 1; break;
X case 'X': d->x_flag = 1; break;
X case 'H': d->h_flag = 1; break;
X case 'Y': d->y_flag = 1; break;
X case 'G': d->g_flag = 1; break;
X case 'J': d->j_flag = 1; break;
X case 'D': d->d_flag = 1; break;
X case 'T': d->t_flag = 1; break;
X case 'R': d->r_flag = 1; break;
X case 'Z': d->z_flag = 1; break;
X case 'S': d->s_flag = 1; break;
X case 'P': d->p_flag = 1; break;
X case 'M': d->m_flag = 1; break;
X case 0:
X fprintf (stderr, "no flags on word %s\n", lbuf);
X continue;
X default:
X fprintf (stderr, "unknown flag %c word %s\n",
X *p, lbuf);
X break;
X }
X p++;
X if (*p == '/') /* Handle old-format dictionaries too */
X p++;
X }
X return (0);
X }
X
X newcount ()
X {
X char buf[200];
X char lastbuf[200];
X FILE *d;
X int i;
X register char *cp;
X
X fprintf (stderr, "Counting words in dictionary ...\n");
X
X if ((d = fopen (Dfile, "r")) == NULL) {
X fprintf (stderr, "Can't open dictionary\n");
X exit (1);
X }
X
X for (i = 0, lastbuf[0] = '\0'; fgets (buf, sizeof buf, d); ) {
X for (cp = buf; *cp; cp++) {
X if (mylower (*cp))
X *cp = toupper (*cp);
X }
X if (strcmp (buf, lastbuf) != 0) {
X if ((++i & 1023) == 0) {
X printf ("%d ", i);
X fflush (stdout);
X }
X strcpy (lastbuf, buf);
X }
X }
X fclose (d);
X printf ("\n%d words\n", i);
X if ((d = fopen (Cfile, "w")) == NULL) {
X fprintf (stderr, "can't create %s\n", Cfile);
X exit (1);
X }
X fprintf (d, "%d\n", i);
X fclose (d);
X }
SHAR_EOF
fi # end of overwriting check
echo shar: extracting "'config.X'" '(4594 characters)'
if test -f 'config.X'
then
echo shar: will not over-write existing file "'config.X'"
else
sed 's/^X //' << \SHAR_EOF > 'config.X'
X /*
X * This is the configuration file for ispell. Thanks to Bob McQueer
X * for creating it and making the necessary changes elsewhere to
X * support it.
X * Look through this file from top to bottom, and edit anything that
X * needs editing. There are also five or six variables in the
X * Makefile that you must edit. Note that the Makefile edits this
X * file (config.X) to produce config.h. If you are looking at
X * config.h, you're in the wrong file.
X *
X * Don't change the funny-looking lines with !!'s in them; see the
X * Makefile!
X */
X
X /*
X ** library directory for hash table(s) / default hash table name
X ** If you intend to use multiple dictionary files, I would suggest
X ** LIBDIR be a directory which will contain nothing else, so sensible
X ** names can be constructed for the -d option without conflict.
X */
X #ifndef LIBDIR
X #define LIBDIR "!!LIBDIR!!"
X #endif
X #ifndef DEFHASH
X #define DEFHASH "!!DEFHASH!!"
X #endif
X
X #ifdef USG
X #define index strchr
X #define rindex strchr
X #endif
X
X /* environment variable for user's word list */
X #ifndef PDICTVAR
X #define PDICTVAR "WORDLIST"
X #endif
X
X /* default word list */
X #ifndef DEFPDICT
X #define DEFPDICT ".ispell_words"
X #endif
X
X /* environment variable for include file string */
X #ifndef INCSTRVAR
X #define INCSTRVAR "INCLUDE_STRING"
X #endif
X
X /* default include string */
X #ifndef DEFINCSTR
X #define DEFINCSTR "&Include_File&"
X #endif
X
X /* mktemp template for temporary file - MUST contain 6 consecutive X's */
X #ifndef TEMPNAME
X #define TEMPNAME "/tmp/ispellXXXXXX"
X #endif
X
X /* default dictionary file */
X #ifndef DEFDICT
X #define DEFDICT "!!DEFDICT!!"
X #endif
X
X /* path to LOOK (if look(1) command is available) */
X #ifndef LOOK
X #undef LOOK
X #endif
X
X /* path to egrep (use speeded up version if available) */
X #ifndef EGREPCMD
X #define EGREPCMD "/bin/egrep"
X #endif
X
X /* path to wordlist for Lookup command (typically /usr/dict/{words|web2} */
X #ifndef WORDS
X #define WORDS "/usr/dict/words"
X #endif
X
X /* buffer size to use for file names if not in sys/param.h */
X #ifndef MAXPATHLEN
X #define MAXPATHLEN 240
X #endif
X
X /* word length allowed in dictionary by buildhash */
X #define WORDLEN 30
X
X /* suppress the 8-bit character feature */
X #ifndef NO8BIT
X #define NO8BIT
X #endif
X
X /* maximum number of include files supported by xgets; set to 0 to disable */
X #ifndef MAXINCLUDEFILES
X #define MAXINCLUDEFILES 5
X #endif
X
X /* Approximate number of words in the full dictionary, after munching.
X ** Err on the high side unless you are very short on memory, in which
X ** case you might want to change the tables in tree.c and also increase
X ** MAXPCT.
X **
X ** (Note: dict.191 is a bit over 15000 words. dict.191 munched with
X ** /usr/dict/words is a little over 28000).
X */
X #ifndef BIG_DICT
X #define BIG_DICT 29000
X #endif
X
X /*
X ** Maximum hash table fullness percentage. Larger numbers trade space
X ** for time.
X **/
X #ifndef MAXPCT
X #define MAXPCT 70 /* Expand table when 70% full */
X #endif
X
X /*
X ** the isXXXX macros normally only check ASCII range. These are used
X ** instead for text characters, which we assume may be 8 bit. The
X ** NO8BIT ifdef shuts off significance of 8 bit characters. If you are
X ** using this, and your ctype.h already masks, you can simplify.
X */
X #ifdef NO8BIT
X #define myupper(X) isupper((X)&0x7f)
X #define mylower(X) islower((X)&0x7f)
X #define myspace(X) isspace((X)&0x7f)
X #define myalpha(X) isalpha((X)&0x7f)
X #else
X #define myupper(X) (!((X)&0x80) && isupper(X))
X #define mylower(X) (!((X)&0x80) && islower(X))
X #define myspace(X) (!((X)&0x80) && isspace(X))
X #define myalpha(X) (!((X)&0x80) && isalpha(X))
X #endif
X
X /*
X ** the NOPARITY mask is applied to user input characters from the terminal
X ** in order to mask out the parity bit.
X */
X #ifdef NO8BIT
X #define NOPARITY 0x7f
X #else
X #define NOPARITY 0xff
X #endif
X
X
X /*
X ** the terminal mode for ispell, set to CBREAK or RAW
X **
X */
X #ifndef TERM_MODE
X #define TERM_MODE CBREAK
X #endif
X
X /*
X ** Define this if you want your columns of words to be of equal length.
X ** This will spread short word lists across the screen instead of down it.
X */
X #ifndef EQUAL_COLUMNS
X #undef EQUAL_COLUMNS
X #endif
X
X /*
X ** This is the extension that will be added to backup files
X */
X #ifndef BAKEXT
X #define BAKEXT ".bak"
X #endif
X
X /*
X ** Define this if you want the capitalization feature. This will increase
X ** the size of the hashed dictionary on most 16-bit and some 32-bit machines.
X */
X #ifndef CAPITALIZE
X #define CAPITALIZE
X #endif
X
X /*
X ** Define this if you want your personal dictionary sorted. This may take
X ** a long time for very large dictionaries. Dictionaries larger than
X ** SORTPERSONAL words will not be sorted.
X */
X #ifndef SORTPERSONAL
X #define SORTPERSONAL 1000
X #endif
SHAR_EOF
fi # end of overwriting check
echo shar: extracting "'fixdict.sh'" '(2502 characters)'
if test -f 'fixdict.sh'
then
echo shar: will not over-write existing file "'fixdict.sh'"
else
sed 's/^X //' << \SHAR_EOF > 'fixdict.sh'
X : Use /bin/sh
X #
X # Add capitalization information to an ispell dictionary
X #
X # Usage:
X #
X # fixdict dict-file
X #
X # Requires availability of UNIX spell. The new dictionary is
X # rewritten in place. A list of words that couldn't be
X # resolved (because spell doesn't know them) is written to
X # standard output. This list appears in lowercase in the
X # dictionary, and if there are any errors the must be edited
X # by hand.
X #
X # The final dictionary appears in expanded form and must be
X # passed through munchlist to regenerate suffixes.
X #
X LIBDIR=/tmp2/lib
X EXPAND1=${LIBDIR}/isexp1.sed
X EXPAND2=${LIBDIR}/isexp2.sed
X EXPAND3=${LIBDIR}/isexp3.sed
X EXPAND4=${LIBDIR}/isexp4.sed
X TDIR=${TMPDIR:-/tmp}
X TMP=${TDIR}/fix$$
X
X trap "/bin/rm -f ${TMP}*; exit 1" 1 2 15
X sed -f ${EXPAND1} $1 | sed -f ${EXPAND2} \
X | sed -f ${EXPAND3} | sed -f ${EXPAND4} \
X | tr '[A-Z]' '[a-z]' \
X | spell \
X | sort > ${TMP}a
X #
X # ${TMP}a contains all the words that spell doesn't like.
X # Now figure out which of those are because spell doesn't know them at
X # all, and leave those in ${TMP}b.
X #
X tr '[a-z]' '[A-Z]' < ${TMP}a | spell | tr '[A-Z]' '[a-z]' > ${TMP}b
X #
X # The wrongly-capitalized words are those that spell didn't object to
X # in the last step. Produce a list of them in, and capitalize the
X # first letter of each. Save this list in ${TMP}c.
X #
X comm -23 ${TMP}a ${TMP}b \
X | sed 's/^a/A/;s/^b/B/;s/^c/C/;s/^d/D/;s/^e/E/;s/^f/F/;s/^g/G/;s/^h/H/
X s/^i/I/;s/^j/J/;s/^k/K/;s/^l/L/;s/^m/M/;s/^n/N/;s/^o/O/;s/^p/P/
X s/^q/Q/;s/^r/R/;s/^s/S/;s/^t/T/;s/^u/U/;s/^v/V/;s/^w/W/;s/^x/X/
X s/^y/Y/;s/^z/Z/' > ${TMP}c
X #
X # Find out which of those spell objects to, saving the failures in ${TMP}d.
X #
X spell ${TMP}c > ${TMP}d
X #
X # Extract the words which were correctly capitalized at the first letter,
X # combine them with an all-capitals version of the ones that weren't, and
X # put the result into ${TMP}e.
X #
X (comm -23 ${TMP}c ${TMP}d; tr '[a-z]' '[A-Z]' < ${TMP}d) \
X | sort -o ${TMP}e
X #
X # At this point, ${TMP}b contains the words that spell just plain doesn't
X # like, and ${TMP}e contains the words that are now capitalized correctly.
X #
X /bin/rm ${TMP}[cd]
X #
X # Put it all together, rewriting the dictionary in place.
X #
X sed -f ${EXPAND1} $1 | sed -f ${EXPAND2} \
X | sed -f ${EXPAND3} | sed -f ${EXPAND4} \
X | tr '[A-Z]' '[a-z]' \
X | sort \
X | comm -23 - ${TMP}a \
X | sort -f -o $1 - ${TMP}b ${TMP}e
X #
X # Finally, write the list of words that have questionable capitalization
X # to the standard output.
X #
X cat ${TMP}b
X /bin/rm ${TMP}*
SHAR_EOF
chmod +x 'fixdict.sh'
fi # end of overwriting check
# End of shell archive
exit 0
--
Brandon S. Allbery {decvax,cbatt,cbosgd}!cwruecmp!ncoast!allbery
Tridelta Industries {ames,mit-eddie,talcott}!necntc!ncoast!allbery
7350 Corporate Blvd. necntc!ncoast!allbery@harvard.HARVARD.EDU
Mentor, OH 44060 +01 216 255 1080 (also eddie.MIT.EDU)