[comp.unix.questions] script to mark mispelled words in a text file

plantz@manta.NOSC.MIL (Glen W. Plantz) (02/07/91)

I need a script to "mark" misspelled words in a text file. I am
aware of the the unix "spell" command, but was interested if anyone
has a script (cshell, awk, perl, etc) to input as a parameter, a file
indicating a "local dictionary" of words to skip, and the text file
to check, outputting this same text file with the misspelled words marked
somehow.
	Thanks in advance,

==============================================================================
Glen Plantz				Computer Sciences Corporation
					4045 Hancock Street
email - plantz@nosc.mil			San Diego, CA  92110
phone - (619)225-2538
fax   - (619)226-0462
==============================================================================

tchrist@convex.COM (Tom Christiansen) (02/10/91)

From the keyboard of plantz@manta.NOSC.MIL (Glen W. Plantz):
:I need a script to "mark" misspelled words in a text file. I am
:aware of the the unix "spell" command, but was interested if anyone
:has a script (cshell, awk, perl, etc) to input as a parameter, a file
:indicating a "local dictionary" of words to skip, and the text file
:to check, outputting this same text file with the misspelled words marked
:somehow.

Since I just wrote one of these, here's another.  It uses spell to get
a list of words to mark, then goes through the original input list,
marking any words that spell didn't like.

It can make the words standout either like diction does, or with
underline or vt100 inverse video.  A local list of words can be
specified with a '-f file' option.

--tom

#!/usr/bin/perl

# $strategy = 'underline'   # default to 'standout'
$USAGE = "usage: $0 [-f localdict] files ...\n";

if ($ARGV[0] eq '-f') {
    shift;
    $WORDS = shift || die $USAGE;
    open WORDS || die "can't open $WORDS $!";
    while (<WORDS>) {
	chop;
	$seen{$_}++;
    } 
}

# You have to give at least one file so it can go through
# the file twice -- I didn't bother dup'ing the input stream.
die $USAGE unless @ARGV;

$code = "while (<>) {\n    study;\n";

if ($strategy ne 'underline') {
    ($SO, $SE) = ('*[ ',  ' ]*');  # like diction
    ($SO, $SE) = ("\033[7m",  "\033[m"); # vt100 standout
}

open(SPELL, "spell @ARGV |");
while (<SPELL>) {
    chop;
    next if $seen{$_};
    ($lhs = $_) =~ s/(\W+)/\\$1/g;  # maybe paranoid
    if ($strategy eq 'underline') {
	s/(.)/_\b$1/g;
	$code .= "    s/\\b$lhs\\b/$_/g;\n"; # add /i?
    } else {
	$code .= "    s/\\b$lhs\\b/$SO$_$SE/g;\n";
    }
} 
close(SPELL) || die "can't spell @ARGV";
$code .= "    print;\n}\n";
#print STDERR $code;
eval $code;
die $@ if $@;
--
 "All things are possible, but not all expedient."  (in life, UNIX, and perl)