[comp.sources.amiga] v02i106: stripcr - strip carriage returns from files

page@swan.ulowell.edu (Bob Page) (12/29/88)

Submitted-by: fgd3@jc3b21.uucp (Fabbian G. Dufoe)
Posting-number: Volume 2, Issue 106
Archive-name: utils/stripcr.1

#	This is a shell archive.
#	Remove everything above and including the cut line.
#	Then run the rest of the file through sh.
#----cut here-----cut here-----cut here-----cut here----#
#!/bin/sh
# shar:    Shell Archiver
#	Run the following text with /bin/sh to create:
#	Makefile
#	StripCR.c
#	StripCR.doc
#	StripCR.lnk
# This archive created: Wed Dec 28 14:49:27 1988
cat << \SHAR_EOF > Makefile
lc StripCR
blink with StripCR.lnk
SHAR_EOF
cat << \SHAR_EOF > StripCR.c
/* StripCR.c

   by Fabbian G. Dufoe, III
   6 December 1988

   This program is released into the public domain.

   This program copies its standard input to standard output, omitting
   carriage returns ('\n') unless it encounters a string of carriage returns
   longer than the number specified by the user.  If the user doesn't
   specify a number the program will omit single carriage returns but
   preserve any string of two or more.

   The program will not delete carriage returns that are followed by a blank
   unless the user specifies the -b flag.

   The user can specify the number of carriage returns to be stripped from
   the file with a command line option.

   Usage: StripCR -b -sn

   Flags    Description

      b     If this flag is specified strings of carriage returns followed
            by a blank are candidates for deletion.  The default is not to
            delete such carriage returns.  This feature preserves paragraphs
            which are identified by indenting.

      s     Spacing of document.  This flag allows the user to enter the
            number of consecutive carriage returns which will be stripped
            from the document.  The default value is 1.  That means single
            carriage returns will be omitted from the output file but
            strings of two or more carriage returns will be preserved.
            Setting n to any higher value will cause strings of n or less
            carriage returns to be omitted.  A string of carriage returns
            will be preserved only if there are more than n carriage returns
            in the string.
*/

#include <stdio.h>

#define FATAL 20

main(argc, argv)
int argc;
char **argv;
{
   int b = 0;
   /* If this is 0 don't delete strings of carriage returns if they are
      followed by a blank.  If it is 1 such strings may be deleted if they
      exceed the length specified by the user. */
   int c; /* Character read from standard input */
   int CRcnt = 0; /* Number of carriage returns in this string */
   int i; /* Loop counter */
   int s = 1; /* Length of longest carriage return string to be stripped */
   enum {CHAR, CR} STATE = CHAR;
   /* The program is implemented as a two-state machine.  It is either
      copying characters (CHAR) or accumulating carriage returns (CR). */
   void Usage(char *, char *); /* Display usage message and terminate. */

   /* Check the number of command line arguments. */
   if (argc > 3)
      Usage(argv[0], "Too many arguments.");

   /* Parse the command line. */
   for (i = 1; i < argc; i++)
   {
      /* If the user requested help display a message. */
      if (argv[i][0] == '?')
         Usage(argv[0], "Preserve strings of n or fewer CRs.");
      if (argv[i][0] != '-')
         Usage(argv[0], "Invalid argument specification.");
      else
      {
         switch (toupper(argv[i][1]))
         {
         case 'B':
            b = 1;
            break;
         case 'S':
            s = atoi(&argv[i][2]);
            break;
         default:
            Usage(argv[0], "Unknown argument.");
         }
      }
   }

   /* Read standard input until end of file is encountered. */
   while ((c = getchar()) != EOF)
   {
      /* Which state are we in now? */
      switch (STATE)
      {
      /* We're copying characters. */
      case CHAR:
         if (c == '\n')
         {
            /* Finding a carriage return causes a state change. */
            STATE = CR;
            /* Count the first carriage return. */
            CRcnt++;
         }
         else
            /* It was just another character, copy it to standard output. */
            putchar(c);
         /* Read the next character. */
         break;
      /* We're counting carriage returns. */
      case CR:
         if (c == '\n')
            /* As long as we get carriage returns just add them to the
               count. */
            CRcnt++;
         else
         {
            /* When we stop getting carriage returns it's time to see if we
               should preserve them or not.  We preserve carriage returns
               between paragraphs unless the user turned off that
               feature. */
            if (c == ' ' && b == 0)
               for (i = 0; i < CRcnt; i++)
                  putchar('\n');
            /* We preserve carriage returns if we've encountered a string of
               them longer than the length specified by the user. */
            else if (CRcnt > s)
               for (i = 0; i < CRcnt; i++)
                  putchar('\n');
            /* If neither condition for preserving a string of carriage
               returns is satisfied we replace them with a single blank. */
            else
               putchar(' ');
            /* Copy the transition character. */
            putchar(c);
            /* Change state. */
            STATE = CHAR;
            /* Clear the carriage return count. */
            CRcnt = 0;
         }
      }
   }

   /* Signal normal completion and terminate. */
   return(0);
}


void
Usage(pgm, msg)
char *pgm;
char *msg;
{
      fprintf(stderr, "Usage: %s -b -sn\n", pgm);
      fprintf(stderr, "   %s\n", msg);
      exit(FATAL);
}
SHAR_EOF
cat << \SHAR_EOF > StripCR.doc
StripCR Instructions

     Text editors like Ed and MicroEmacs are line oriented.  They put a 
carriage return at the end of each line on the screen.  Word processors like
WordPerfect and Textcraft and editors like Notepad are paragraph oriented.  
They use carriage returns to indicate the end of a paragraph.

      StripCR removes all the extra carriage returns from a file created 
with a line oriented editor so you can edit it with a paragraph oriented 
editor.  It tries to keep the paragraphs in the original document from 
running together by distinguishing between carriage returns at the end of 
lines and carriage returns that separate paragraphs.

     It will eliminate all the lone carriage returns from a file, leaving 
strings of multiple carriage returns untouched.  Typically, a single-spaced 
document will have two carriage returns at the end of a paragraph.  One 
marks the end of the paragraph's last line and the other makes the blank 
line between paragraphs.  StripCR leaves those two carriage returns in place
so you won't lose your paragraph definitions.

     You can control the number of carriage returns that will be removed. 
You specify the number with the "-s" option.  The number you supply tells 
the program it can delete a group of carriage returns as large as that 
number.  If the program finds more carriage returns than you specified it 
will preserve them.

     Another way the program tries to recognize paragraphs is by indented 
text.  If the next character after a carriage return is a blank the program 
won't remove that string of carriage returns unless you specify "-b" on the 
command line.  By using the "-b" flag you turn off that feature.

     When the program removes a carriage return (or a group of carriage 
returns) it puts a blank in their place.  That keeps the last word on a line
>from being joined to the first word on the next line.  

     The program works by copying standard input to standard output.  That 
keeps you from clobbering your input file.  If you don't like the result you
can try again with different command line options.

     To use the program issue the following command from the CLI:

          StripCR <InputPathName >OutputPathName [-b] [-sn]

where n is the size of the largest group of carriage returns that will be 
deleted.  The brackets indicate the -b and -s arguments are optional.

SHAR_EOF
cat << \SHAR_EOF > StripCR.lnk
>FROM LIB:c.o+StripCR.o
TO StripCR
LIB +LIB:lc.lib+LIB:amiga.lib
MAP StripCR.map
SHAR_EOF
#	End of shell archive
exit 0
-- 
Bob Page, U of Lowell CS Dept.  page@swan.ulowell.edu  ulowell!page
Have five nice days.