[net.sources] correct misspellings in a text file

rik@ucla-cs.UUCP (05/06/85)

Here's a program I use all the time to correct a text file.  It is
different from "spellfix" posted recently in that it makes the changes
to the file directly.  It is a Bourne shell script but I heard that a
similar program (written in C and therefore probably more efficient)
was posted a couple of years ago.  If somebody has a copy of that,
please post so that we can compare.

Rik Verstraete.
ARPA: rik@UCLA-CS.ARPA
UUCP: ...!{cepu,ihnp4,trwspp,ucbvax}!ucla-cs!rik
------------------------------------------------------------------------
#!/bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #!/bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create the files:
#	correct.1
#	correct
# This archive created: Mon May  6 09:48:33 1985
export PATH; PATH=/bin:$PATH
echo cshar: extracting "'correct.1'" '(4126 characters)'
if test -f 'correct.1'
then
	echo cshar: over-writing existing file "'correct.1'"
fi
sed 's/^X//' << \SHAR_EOF > 'correct.1'
X.TH CORRECT 1P 12/21/83
X.SH NAME
Xcorrect \- program to find and correct spelling errors
X.SH SYNOPSIS
X.B correct
X[
X.B \-\fIn\fP
X]
X[
X.B \-\fId\fP
X.B \fIdictionary\fP
X] file
X.SH DESCRIPTION
X.I correct
Xis a program that
Xfinds the spelling errors in a file,
Xedits this file to correct the mistakes,
Xand allows the user to maintain a dictionary file.
XUsing
X.I correct
Xhas been preferable to running
X\fIspell\fR(1)
Xand then editing the file.
X.PP
XFirst,
Xthe program runs \fIspell\fR(1).
XThe list of errors produced is
Xchecked against the entries in a user-defined dictionary file,
Xand the ones not in the dictionary are filtered out.
XThe default dictionary is $HOME/dictionary
X(which need not exist;
X.I correct
Xwill create it if necessary).
XA different dictionary file can be
Xspecified by using the
X.I \-d
Xflag.
X.PP
XIf the
X.I \-n
Xoption is not used (see below),
Xthe program then presents each error to the user,
Xoffering five options:
X.TP
X.I a[dd]
Xadd the word to the user dictionary file.
X.TP
X.I s[ubstitute]
Xsubstitute the word throughout the entire file.
X.TP
X.I c[heck]
Xcheck the word in a line-by-line context
X(with option to edit at each place).
X.TP
X.I l[ook]
Xlook up the word in the system dictionary
X(via the \fIlook\fR(1) command).
X.TP
X<return>
Xno action.
X.PP
XWhen all the errors have been processed,
X.I correct
Xgives the user a final chance to abort the
Xcorrections by prompting for yes/no.
XA reply `yes' means that the original file will be changed
Xto reflect the corrections of the spelling errors.
XFinally, the dictionary is sorted
Xand all temporary workspace is removed.
XNote that
X.I correct
Xalways sorts the dictionary file,
Xeven if no errors were found.
X.sp
XWith the
X.I \-n
Xflag,
Xno editing is done;
X.I correct
Xjust checks the user lexicon
Xand sends the resulting list to the standard output
X(looks like \fIspell\fR(1) except that
Xthe list does not contain entries from the dictionary, and
Xis sorted in \s-2ASCII\s+2 rather than alphabetical order).
X.sp
XThe
X.I \-d
Xflag is used to specify a different dictionary.
XThe flag should be followed by a file name,
Xwhich will be taken as the dictionary,
Xinstead of $HOME/dictionary.
XNote that the
X.I \-n
Xflag, the
X.I \-d
Xflag (with dictionary file name),
Xand the file itself,
Xcan be specified in any order.
X.LP
X.I Restriction:
XThis program only works on
X.I one
Xfile;
Xit can not edit more than one file at a time.
XAlso,
X``.so'' \fItroff\fR(1) commands should not be used
Xin the file to be corrected.
X.SH FILES
X$HOME/dictionary: default dictionary file.
X.SH SEE\ ALSO
Xspell(1),
Xlook(1),
Xsort(1),
Xgrep(1),
Xex(1),
Xsed(1).
X.SH DIAGNOSTICS
X.I correct
Xwill complain if
Xmore than one or no file to be corrected is specified;
Xif the
X.I \-d
Xflag is used without a dictionary file;
Xor if the file to be corrected does not exist.
X.SH BUGS
XA serious bug has to do with
Xthe form of search.
XThe
X.I \-w
Xoption of
X\fIgrep\fR(1)
Xand the /\\<...\\>/ search pattern in
X\fIex\fR(1)
Xare used to find and edit only
Xthose matching letter patterns
Xthat are indeed words.
XA line such as:
X.sp
X.ce
X\\fBGradautes\\fR
X.sp
X(\\fx is the troff command to change to font x)
Xwill cause ``Gradautes'' to come out
Xas an error via
X.I spell .
XHowever,
Xin the editing phase,
X``Gradautes'' will not be picked up as a ``word''
X(because of the fB in front)
Xand will not get changed.
X.PP
XChoosing to check ``line-by-line''
Xleads to problems in the editing stage.
XSpecifically,
Xif the line has $'s, *'s, etc.
X(characters with special meaning to
X.I ex )
Xthen the editing may not work as expected.
XThis problem has been alleviated somewhat
Xby using the ``nomagic'' option of
X\fIex\fR(1),
Xbut it still remains a problem...
X.PP
XAnother minor problem with check line-by-line occurs
Xwhen two identical errors appear
Xin one line.
XRight now,
Xboth (or more) errors will be ``corrected'' in the same way;
Xone cannot change the two words in different ways.
X.PP
XApparently,
X.I correct
Xdoesn't work on all dial-up terminals.
XThat is,
Xthe program runs,
Xbut no corrections are made
X(something to do with
X\fIex\fR(1)
Xand minimum baud rates).
X.SH AUTHORS
XTovah Hollander (``tovah@ucla-cs.arpa'')
Xand Rik Verstraete (``rik@ucla-cs.arpa'').
SHAR_EOF
if test 4126 -ne "`wc -c 'correct.1'`"
then
	echo cshar: error transmitting "'correct.1'" '(should have been 4126 characters)'
fi
echo cshar: extracting "'correct'" '(5123 characters)'
if test -f 'correct'
then
	echo cshar: over-writing existing file "'correct'"
fi
sed 's/^X//' << \SHAR_EOF > 'correct'
X:
X: correct: a program to find and correct spelling errors
X:
X: SYNOPSIS: correct [-n] [-d dictionary] file
X: AUTHOR: Tovah Hollander and Rik Verstraete.
X: DATE: Tue Jan 10 14:08:13 PST 1984
X:
X:
Xtrap "/bin/rm -f /tmp/*$$ > /dev/null ; exit" 1 2 15
X: initialize variables
Xsetf=0
Xlexicon=$HOME/dictionary
Xcorrectyn=1
X: all arguments of the command
Xwhile test -n "$1"
X   do case $1 in
X         -n) correctyn=0 ;;
X         -d) shift
X             if test $1
X               then lexicon=$1
X               else echo 'correct: must specify dictionary file with -d flag'
X                    exit
X             fi ;;
X          *) if test $setf -eq 1
X	       then echo 'correct: can work only on one file'
X                    exit
X             fi
X	     file=$1
X             setf=1 ;;
X      esac
X      shift
X   done
X: is a file name specified?
Xif test $setf -eq 0
X  then echo 'correct: must specify a file name'
X       exit
Xfi
X: does file exist?
Xif test ! -f $file
X  then echo 'correct: cannot open '$file
X       exit
Xfi
X: create dictionary if necessary
Xif test ! -f $lexicon
X  then echo '' > $lexicon
Xfi
X: sort dictionary
Xsort -u $lexicon -o $lexicon
X: find all errors not in the dictionary
Xspell $file | sort -u -o /tmp/spell$$
Xcomm -23 /tmp/spell$$ $lexicon > /tmp/errors$$
X: process errors one by one, if any, and if -n flag is not set
Xif test $correctyn -eq 0
X  then cat /tmp/errors$$
Xelif test -s /tmp/errors$$
X  then
X       set `cat /tmp/errors$$`
X       echo '
XChoose for each word:
X a(dd),s(ubstitute),c(heck),l(ook),h(elp),or <return>'
X:
X: process all errors one by one
X:
X       for i
X       do
X         word=$i
X         valid=0
X         until test $valid -eq 1
X           do echo ''
X              echo -n '"'$word'": '
X              read action
X              case $action in
X              a*) echo $word >> $lexicon
X                  valid=1 ;;
X              s*) echo -n '  Substitute: '
X                  read newword
X		  echo "g/\<"$word"\>/s//"$newword"/g" >> /tmp/ex-script$$
X                  valid=1 ;;
X              c*) grep -w $word $file > /tmp/line$$
X                  mode=1
X                  until test `grep -cw $word /tmp/line$$` -eq 0
X                    do echo ''
X                       echo '  Context is:'
X                       line=`head -1 /tmp/line$$`
X		       echo $line
X                       echo -n '  Edit (y/n)? '
X                       read yorn
X                       case $yorn in
X                       y*) if test $mode -eq 1
X                             then mode=0
X                                  echo -n '  Substitute: '
X                                  read newword
X				  echo "/^$line$/s/\<"$word"\>/"$newword"/g" >> /tmp/ex-script$$
X                             else echo -n '  Substitute "'$newword'" (y/n)? '
X                                  read ans
X                                  case $ans in
X                                  y*) echo "/^$line$/s/\<"$word"\>/"$newword"/g" >> /tmp/ex-script$$
X                                      ;;
X                                  *) echo -n '  Substitute: '
X                                     read newword
X				     echo "/^$line$/s/\<"$word"\>/"$newword"/g" >> /tmp/ex-script$$
X                                  esac
X                           fi ;;
X                       n*) echo -n "  Add to dictionary? "
X                           read yorn
X                           case $yorn in
X                           y*) echo $word >> $lexicon
X                           esac
X                       esac
X	  	       sed -e 1d /tmp/line$$ > /tmp/junk$$
X		       mv /tmp/junk$$ /tmp/line$$
X                  done
X                  valid=1 ;;
X              l*) echo -n '  Enter search string: '
X                  read string
X                  look $string > /tmp/look$$
X                  if test ! -s /tmp/look$$
X                    then echo '  No words found.'
X                    else echo '  Words found in system dictionary: '
X                         cat /tmp/look$$
X                  fi ;;
X              h*) echo '
XOPTIONS (choose for each incorrect word):
X   a(dd)        - add the word to the lexicon
X   s(ubstitute) - make a global substitution for the word
X   c(heck)      - check the word in each context
X   l(ook)       - check system dictionary for possible corrections
X   h(elp)       - print this list
X   <return>     - no action' ;;
X             "") valid=1 ;;
X              *) echo '"'$action'" is invalid option.
XChoose for each word:
X a(dd),s(ubstitute),c(heck),l(ook),h(elp),or <return>
X' ;;
X              esac
X           done
X       done
X: make corrections, if any
X       if test -f /tmp/ex-script$$
X         then echo ''
X              echo -n 'Do you want to make all the corrections now? '
X              read yorn
X              case $yorn in
X              y*) echo w $file >> /tmp/ex-script$$
X                  echo q >> /tmp/ex-script$$
X                  ex $file < /tmp/ex-script$$ > /dev/null ;;
X              *)  echo "corrections aborted..."
X              esac
X              rm /tmp/ex-script$$
X       fi
X: clean up and sort lexicon
X       spell $lexicon | sort -u -o $lexicon
Xfi
X/bin/rm -f /tmp/*$$ > /dev/null
SHAR_EOF
if test 5123 -ne "`wc -c 'correct'`"
then
	echo cshar: error transmitting "'correct'" '(should have been 5123 characters)'
fi
chmod +x 'correct'
#	End of shell archive
exit 0