[net.sources] common typo checker

jmg@dolphy.UUCP (Intergalactic Psychic Police Of Uranus) (10/13/85)

*** How about a computer program that can stand up for America! ***

Saiph Corporation, Laurant Imbaud - President has been kind enough
to sponsor a major coming together of art and computers!  A technical
triumph that could only be compared to intellectual magnitude of
the ACM.  Yes...

Are you tired of all those Communists causing your typing errors?
Do you want to solve the homeless problem while sitting behind a
	computer screen?

Then Lesbian Liberals of Sodom Corp has the program for you.

Yes, young professionals like you and I, Ninja Masters Of Self-Deception,
have joined forces in a world wide circle dance reminiscent of those
dark days of the 60's when unrealistic youth banded together with forces
greater than themselves in a misguided effort to leave the days of light
and enlightenment known as the 50's, and we have brought you a computer
program that is the be all and end all of that science known as 'computers'.

Yes, 'typo' is it and its yours for non-profit use only...just get naked,
turn on the news, close your eyes and say to yourself:

#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create the files:
#	README.typo
#	typo.c
# This archive created: Sat Oct 12 20:48:59 1985
# Special Thanks: Taki's Donuts - the only jewish/arabic/greek/puerto rican
#   donut house that's convienent in NYC.
# Thanks: Saiph Corp. Laurant Imbaud - President.
# By:	Intergalactic Psychic Police Of Uranus (Lesbian Liberals Of Sodom Corp.)
export PATH; PATH=/bin:$PATH
echo shar: extracting "'README.typo'" '(2252 characters)'
if test -f 'README.typo'
then
	echo shar: will not over-write existing file "'README.typo'"
else
cat << \SHAR_EOF > 'README.typo'
typo is a program to check for common typing errors.
I'm sure something like it exists already (so don't call home)
-- it's meant to be used by those that don't have such a thing.
It is useful with 'spell' & 'droff' to proofread texts.

typo was written to aid in the production of an esoteric art journal
we are putting out called, "The Act", which
is about 'performance art' or human activity.  "The Act" will cost
about $4 and I expect everyone on the net to run out and
buy one or to ask around for it...after all, as
dave@tektronix knows, we all live for computers... so,
it will be available as a winter/spring issue at an obscure,
peculiar bookshop near you.

The source is copyrighted by Jeffrey Greenberg 1985,
and released only for non-profit use, and,
furthermore, if you are doing 'defense' work you are forewarned
that the 'for' loops in this program spin at high speeds and have
usage guards, which, when they detect certain words and sentence structures,
will fall off, allowing the 'for' loop to break loose and smash through
the screen, lopping off both your arms and your 'advanced
projects' research grants.  (Poets needn't be told that 'typo' will
be off little use to them.)

Please mail any changes and enhancements to ihnp4!allegra!phri!dolphy!jmg

We don't even have 'nroff' let alone 'man' (or for that matter
spell or droff - we wrote our own half-baked versions) so here is a summary:

	typo [files]

	Check for common typing errors in texts.

	If 'files' provided, typo examines them, else, stdin.

	Will detect:
		floating letters (e.g. the man s jumped) - the 's',
		floating punctuation (e.g. the man . jumped) - the '.',
		repeated words (e.g. the the man jumped) - the 'the',
		nonsense words (e.g. the ma{n jumped) - the 'ma{n',
		nonsense punctuation (e.g. the >man< jumped) - the '>man<'.

	Will not detect:
		common transpositions: that's the job of 'spell',
		words with numbers in them: (ditto).

	Gets Confused by:
		ellipsis (...),
		some punctuation (such as 'e.g.', itself), and
		Names like 'McCarthy' with weird capitalization.

	Note: its idea of nonsense punctuation are stylistically
		constrained to the whims of the author but are not
		incorrect.

	Author: Jeffrey Greenberg 212-966-1334, (ihnp4!allegra!phri!dolphy!jmg)
SHAR_EOF
fi # end of overwriting check
echo shar: extracting "'typo.c'" '(6096 characters)'
if test -f 'typo.c'
then
	echo shar: will not over-write existing file "'typo.c'"
else
cat << \SHAR_EOF > 'typo.c'
/* Typo: check for common typographic errors
 * Copyright Jeffrey Greenberg, 1985. All rights reserved.
 * Usage for non-profit use only!
 * Thanks to 'Saiph Corp. Laurant Imbaud, President'.
 *
 * Version 1.
 *
 * Detects:
 *  Two or more words in a row
 *  Incorrect capitalization.
 *  nonsense words and punctuation.
 * Prints the word in question and the line number.
 *
 * compile: cc typo.c -o typo
 *
 * Please mail all changes & enchancements to ihnp4!allegra!phri!dolphy!jmg
 */

#include <stdio.h>
#include <ctype.h>

#define	BUFLEN	256

char *copyright;

/* The main idea is to read the file line by line, break it into some idea
 * what a word is, then operate on that word.
 */
main(argc,argv)
int argc;
char *argv[];
{
	FILE *fopen();
	int arg = 1;

	copyright = "Jeffrey Greenberg 1985";

	if( argc == 1 ) {
		typo();
	}
	else {
		for( --argc; argc; argc--, arg++ ) {
			if( freopen( argv[ arg ],"r",stdin) == NULL ) {
				perror( argv[ arg ] );
				continue;
			}
			printf("typo: checking %s\n",argv[arg]);
			typo();
		}
	}

	bye();
}

typo()
{
	int word_count = 0, lineno = 0;
	char *word;

	for( ; get_word( &lineno, &word); word_count++ ) {
		capitalization( lineno, &word );
		dbl_word( lineno, &word );
		nonsense_word( lineno, &word );
	}

	printf("typo: %d lines, %d words checked\n\n",lineno, word_count);
}

capitalization( lineno, word )
int lineno;
char **word;
{
	int cap_count = 0, reg_count = 0;
	char *letter;

	/* If word start with a capital, then if there are two caps and
	 * one or more regular letters, it's mis-capitalized.
	 * If word doesn't start with a capital, then if there is one or
	 * more capitalized letters, it's mis-capitalized.
	 *
	 * This algorithm fails for 'McCarthy'...
	 */

	/* SO, first count all caps and regular letters
	 */
	for( letter = (*word); *letter; letter++) {
		if( isupper( *letter ) )
			++cap_count;
		else if( islower( *letter ) )
			++reg_count;
	}

	if( (isupper( **word ) && cap_count > 1 && reg_count)
	||  (islower( **word ) && cap_count) )
		printf("typo: bad capitalization < %s >, line %d\n",*word,lineno);
}

/* If a word repeats itself twice its fucked
 */
dbl_word( lineno, word )
int lineno;
char **word;
{
	static char prevword[BUFLEN];
	static int prevlineno;

	if( ! (*prevword) ) {
		prevlineno = lineno;
		strcpy( prevword, *word);
		return;
	}

	if(strcmp( *word, prevword) == 0)
		printf("typo: repeated word < %s >, line %d and %d\n",
		prevword, lineno, prevlineno);

	prevlineno = lineno;
	strcpy( prevword, *word);
}

nonsense_word( lineno, word )
int *lineno;
char **word;
{
	int length;

	length = strlen( *word );

	/* One letter words can only be 'a' or 'I' or certain
	 * punctuation.
	 */
	if( length == 1 ) {
		switch( **word ) {
		case '@':	/* misc designators */
		case '/':
		case '&':
		case '<':	/* math symbols */
		case '>':
		case '+':
		case '-':
		case '=':
		case 'a':	/* single character words */
		case 'A':
		case 'i':
		case 'I': break;
		default:
			if( isalpha( **word ) || ispunct( **word ) ) {
				printf("typo: floating < %s >, line %d\n",
					*word, lineno);
			}
			break;
		}
		return;
	}

	/* Multi-letter words can't be all consonants or punctuation
	 */
	{
	char *letter;
	int vowels = 0, consonants = 0, punctuation = 0, other = 0, is_bad = 0;
	for( letter = (*word); *letter; letter++) {

		/* If letter is non-ascii or a control character, its bad.
		 */
		if( !isascii( *letter) || iscntrl( *letter) ) {
			++is_bad;
			break;
		}

		if( is_vowel( *letter) ) {
			++vowels;
			continue;
		}
		else
		if( isalpha( *letter )) {
			++consonants;
			continue;
		}
		else
		if( !ispunct( *letter )) {
			++other;
			continue;
		}
		++punctuation;

		/* If only punctuation at start, it must be
		 * right facing or quotes.
		 */
		if( vowels + consonants + other == 0 ) {
			if( !is_rpunct( *letter ) && !is_quotes( *letter) ) {
				++is_bad;
				break;
			}
		}
		else
		if( length - (vowels+consonants+other+punctuation) ) {
			/* two slashes's in a row is bad.
			 * two apostrophe's in a row is bad.
			 * a non-apostrophe's, non-slash followed by a letter or
			 * by right punct is bad.
			 */
			if( *letter == '\'' && *(letter + 1) == '\'' ) {
				++is_bad;
				break;
			}
			else
			if( *letter == '/' && *(letter + 1) == '/' ) {
				++is_bad;
				break;
			}
			else
			if( (*letter != '\'' && *letter != '/') &&
			    (isalpha( *(letter + 1))||is_rpunct( *(letter + 1)))
			) {
				++is_bad;
				break;
			}
		}
	}

	/* A word made of just consonants is bad,
	 * but word made of just consonants and others is ok.
	 */
	if( !vowels && consonants && !other )
		printf("typo: nonsense word < %s >, line %d\n",
			*word, lineno);
	else
	if( punctuation == length || is_bad )
		printf("typo: nonsense punctuation < %s >', line %d\n",
			*word, lineno);

	}
}
is_quotes( ch )
char ch;
{
	switch( ch ) {
	case '`':
	case '\'':
	case '"': return 1;
	default: return 0;
	}
}

is_rpunct( ch )
char ch;
{
	switch( ch ) {
	case '`':
	case '(':
	case '[':
	case '<':
	case '{': return 1;
	default: return 0;
	}
}

is_vowel( ch )
char ch;
{
	switch( ch ) {
	case 'a':
	case 'e':
	case 'i':
	case 'o':
	case 'u':
	case 'y':
	case 'A':
	case 'E':
	case 'I':
	case 'O':
	case 'U': return 1;
	default: return 0;
	}
}

get_word( lineno, word )
int *lineno;
char **word;
{
	int c;
	char *strtok();
	static char line[BUFLEN];
	static int readline = 1;
/*	static char whitespace[] = " 	!@#$%^&*()_+~-=`{}[]|\\:;\"',<>.?/";*/
	static char whitespace[] = " 	-"; /* space, tab & hyphen */

	do {
		/* Get a line if we don't have one already.
		 */
		if( readline ) {
			readline = 0;
			if( gets(line) == NULL )
				/* word not found */
				return 0;
			++(*lineno);

			/* Parse the new line on whitespace
	 		 */
			if( (*word=strtok( line, whitespace )) == 0 )
				/* pexit("parsing word error"); */
				readline = 1;
		}
		else {

			/* Parse the old line on whitespace
			 */
			if( (*word=strtok( 0, whitespace )) == 0 )
				readline = 1;
		}
	} while( *word[0] == 0 );

	/* word found
	 */
	return 1;
}

bye()
{
/*	printf("typo: successful exit\n"); */
	exit(0);
}
SHAR_EOF
fi # end of overwriting check
#	End of shell archive
exit 0