[comp.sys.amiga] Text file tab utility

mark@unisec.USI.COM (Mark Rinfret) (03/23/87)

In the past few months, I've used about 3.5 different editors (the .5 is
for the ones I tried for a couple of days but gave up on quickly).  First,
there was Ed, next came MicroEmacs and now Z.  I'll not attempt to justify
my switching to Z, though it's the reason for this posting.  In the course
of all these changes, my "tabbing and indentation philosophy" (that sounds
lofty!) has undergone severe trauma.  With Ed, the default tab setting was
3 characters, but tab characters didn't actually get stored in the file.
With MicroEmacs, the default was 8 but if you changed it, you got spaces
(right?).  Finally, I have Z which, for all its faults, allows me to set
tabs where I want them (4) and stores true tab characters in the file.
This should result in a significant reduction in the sizes of some of my
sources, considering the 4:1 reduction of spaces to tabs.  Unfortunately,
most of my files are currently tabbed at 8 characters, with sprinklings
of spaces for intervening indentations.  If you find yourself in the same
or similar situation, the following utility may be of use to you.  This
trivial offering simply expands an input file using its current tab setting
(known by you, hopefully), reformatting the file with a new tab setting.
It does nothing about C indentation - just plays with tabs.  It is smart
in a small way about not entabbing quoted strings.  This is the first
opportunity I've had to offer anything to this group, though I've taken
much.  It's one of those stupid little things that anyone can write if they
need it bad enough - I did.  I compiled this under Aztec C V3.4 but it's
vanilla enough to port just about anywhere.

Mark
==========================================================================

/* :set ts=4 */
/*
 * Redefine tabs in a text file.
 * Mark Rinfret, 03/18/87			mark@unisec.USI.COM
 * Filename:	retab.c
 * 
 * Description:
 *	This program inputs a text file with one tab width setting and
 *	creates a new output file which has either a new tab setting or no
 *  tabs.  A few smarts have been included to avoid introducing tab
 *  characters into quoted strings and character constants.  This program
 *  only supports tab settings which are an even multiple of a given value.
 *  For instance, a tab width of 4 results in tab stops at columns
 *  5, 9, 13, etc.  Minimum tab width is 3 columns, maximum is 32.
 *  Examples:
 *
 *	retab -i8 -o4 infile outfile
 *		Converts infile, currently set at 8 column tabs to outfile which
 *		will have 4 column tabs.
 *
 *  retab -i8 -o0 -q infile outfile
 *		Expands all tabs in infile and places the result in outfile,
 *		suppressing statistical info.  There will be no tab characters
 *		in outfile.
 *
 *  retab infile outfile
 *		Converts infile, currently set at 4 column tabs, to outfile, which
 *		will also have 4 column tabs.  Used in this manner, a cleanup
 *		function is provided, optimizing file size by replacing spaces
 *		with tabs, where possible.
 *
 *  The author releases this source to the public domain with no 
 *  restrictions which means that you can use, rewrite, redistribute,
 *  sell or eat it.  
 */


#include <stdio.h>
#include <ctype.h>

#define LINEMAX		255		/* max input line length */
#define MAXTAB		32		/* maximum tab width allowed */
#define MINTAB		3		/* minimum tab width allowed (except 0) */
#define TABIN		4		/* default input file tab setting */
#define TABOUT		4		/* default output file tab setting */

FILE *OpenFile();

FILE *infile,*outfile;		/* input / output files */
char *iname, *oname;		/* input / output file names */
char linebuf[LINEMAX+1];	/* line buffer */
unsigned intab = TABIN, outtab = TABOUT;
unsigned incol, outcol;
unsigned iccnt = 0, occnt = 0, ilcnt = 0, olcnt = 0;
unsigned statistics = 1;

main(argc,argv)
	int argc; char **argv;
{
	char c,*arg;

	++argv;								/* skip program name */

	while (--argc && **argv == '-') {
		arg = *argv;
		if ((c = *++arg) == 'i') {
			intab = atoi(++arg);
			cktab(intab);				/* check tab value */
		}
		else if (c == 'o') {
			outtab = atoi(++arg);
			cktab(outtab);
		}
		else if (c == 'q')				/* quiet mode */
			statistics = 0;
		else
			Usage();					/* bad option */
		++argv;							/* point to next arg */
	}
	if (argc < 2)
		Usage();

	iname = argv[0];
	oname = argv[1];
	infile = OpenFile(iname,"r");
	outfile = OpenFile(oname,"w");
	retab();
	stats();
}

/* Perform the retabbing function. */

retab()
{
	int c;
	unsigned endfile = 0;

    while(!endfile) {
    	incol = 1;
		outcol = 0;
        while ((c = fgetc(infile)) != '\n') {
			if (c == EOF) {
				++endfile;
				break;
			}
			++iccnt;
			if (c == '\t') {			/* input was a tab? */
				do  {
					putbuf(' ');
				} while (outcol % intab != 0);
			}
			else {
				putbuf(c);
			}
		}
		if (c == '\n') {
			++iccnt;
			++ilcnt;
		}
		else if (outcol)			/* something on last line? */
			++ilcnt;

		linebuf[outcol] = '\0';
		outline();
	}
}

/* Put one character in the line buffer, testing for overflow and
 * maintaining the output column, outcol.
 */

putbuf(c)
{
	if (outcol == LINEMAX)
		outline();
	linebuf[outcol++] = c;
}

/* Output the current line. */

outline()
{
	char c,*s;
	unsigned blanks = 0,escape = 0, i, j, outpos = 0, quote = 0;
	unsigned pass_through;

	s = linebuf;
	for (i = 0; i < outcol; ++i) {		/* scan characters in buffer */
		if (outtab) {					/* entab output line? */
			if (i % outtab == 0) {		/* at a tab stop? */
				if (blanks && !quote) {
					if (blanks > 1) {
						fputc('\t',outfile);
						++occnt;
						outpos  = i;
					}
					else {
						fputc(' ',outfile);
						++occnt;
						++outpos;
					}
					blanks = 0;
				}
			}
			pass_through = 0;			/* allow blank checking */
			c = *s++;					/* get next character */
			if (escape) {				/* pass through as is */
				escape = 0;
				++pass_through;
			}
			else if (c == '"' || c == '\'') {	/* quotes? */
				if (quote) {
					if (quote == c) quote = 0; /* end of quote */
				}
				else
					quote = c;
				++pass_through;
			}
			else if (c == '\\') {		/* character escape */
				escape = 1;
				++pass_through;
			}
			if (c == ' ' && !pass_through)
				++blanks;
			else {
				blanks = 0;
				while (outpos < i) {
					fputc(' ',outfile);
					++occnt;
					++outpos;
				}
				fputc(c,outfile);
				++occnt;
				++outpos;
			}
		}
		else {
			fputc(*s++,outfile);
			++occnt;
		}
	}
	fputc('\n',outfile);					/* line terminator */
	++olcnt;							/* count output lines */
}

/* Display correct program usage. */

Usage()
{
	printf(
"Usage: retab [-i<input tab>] [-o<output tab>] [-q] <input> <output>\n\n");
	printf(
"<input tab> is the tab value of the input file.  If not given,\n");
	printf("4 is assumed.\n");
	printf(
"<output tab> is the new tab value for the output file.  If not\n");
	printf(
"given, 4 is assumed.  Zero is also legal, the net effect of which\n");
	printf(
"is to expand tabs in the input file to spaces.\n");
	printf(
"-q specifies quiet mode - no statistics will be output.\n");
	exit(1);
}

/* Check tab value for allowable range */

cktab(val)
	unsigned val;
{
	if (val && (val < MINTAB || val > MAXTAB)) {
		printf("Tab value must be in the range of %d..%d\n",
			MINTAB,MAXTAB);
		exit(1);
	}
}

FILE *OpenFile(name,how)
	char *name, *how;
{
	FILE *fp;

extern int errno;

	if ((fp = fopen(name,how)) == NULL) {
		printf("Can't open %s for %s access, errno is %d.\n",name,how,errno);
		exit(1);
	}
	return fp;
}

/* Report program statistics. */

stats()
{
	char *format = "  %s %-12s  Tabs: %2d, %5u Characters, %5u Lines\n";

	if (statistics) {
		printf("retab statistics:\n");
		printf(format,"Input:  ", iname, intab, iccnt, ilcnt);
		printf(format,"Output: ", oname, outtab, occnt, olcnt);
	}
}


-- 
| Mark R. Rinfret, SofTech, Inc.		mark@unisec.usi.com |
| Guest of UniSecure Systems, Inc., Newport, RI                     |
| UUCP:  {gatech|mirror|cbosgd|uiucdcs|ihnp4}!rayssd!unisec!mark    |
| work: (401)-849-4174	home: (401)-846-7639                        |

cjp@vax135.UUCP (03/24/87)

In article <443@unisec.USI.COM> mark@unisec.USI.COM (Mark Rinfret) writes:
>(right?).  Finally, I have Z which, for all its faults, allows me to set
>tabs where I want them (4) and stores true tab characters in the file.

Well for my money, I like Z a lot but the tabs handliing is one thing
that's badly flawed.  I don't care a bit whether the file is a few
percent larger, but it is imperative that I be able to set 4-wide
indentation stops that *print* *out* *on* *my* *printer* as 4-wide.  My
printer has only 8-space tabs and it does not auto-wrap.  A major point
of 4-spaces indentation is that I can more easily read deeply nested
code.  What use is it if that info all falls into a blot in the 80th
column?  And it's a pain in the behind to run a reformatter for each
printing.  In short: Jim, please bring back the vi usage of ^T, ^D and
optimize out any superfluous spaces on the fly.

	Charles Poirier

dillon@CORY.BERKELEY.EDU.UUCP (03/24/87)

	Both VI and Z have the same problem... namely that there is only one
type of 'tab'.  When you set tabs to 4 in either VI or Z, it writes out
to the file using the tab character but assuming it's 4 rather than 8.

	Rightly, you should either have two separate variables, or should
always write out files using tabs of 8 (i.e. translate the internal tab
format of 4 to the external format of 8 without effecting the apparent text).

				-Matt

cjp@vax135.UUCP (03/24/87)

In article <8703240644.AA00426@cory.Berkeley.EDU> dillon@CORY.BERKELEY.EDU (Matt Dillon) writes:
>
>	Both VI and Z have the same problem... namely that there is only one
>type of 'tab'.  When you set tabs to 4 in either VI or Z, it writes out
>to the file using the tab character but assuming it's 4 rather than 8.

Are we speaking of the same vi?  On 4.2BSD, :set shiftwidth=4, then use
^T to shift line right, ^D to shift left (insert mode), or >>, << command mode.
You get all (8-space) tabs except the rightmost 4 spaces every other shift.
(I like this arrangement by the way.)

	Charles Poirier

mark@unisec.UUCP (03/25/87)

In article <1814@vax135.UUCP>, cjp@vax135.UUCP (Charles Poirier) writes:
> In article <443@unisec.USI.COM> mark@unisec.USI.COM (Mark Rinfret) writes:
> >(right?).  Finally, I have Z which, for all its faults, allows me to set
> >tabs where I want them (4) and stores true tab characters in the file.
> 
> Well for my money, I like Z a lot but the tabs handliing is one thing
> ...
> ...  , but it is imperative that I be able to set 4-wide
> indentation stops that *print* *out* *on* *my* *printer* as 4-wide.  My
> printer has only 8-space tabs and it does not auto-wrap.  A major point
> of 4-spaces indentation is that I can more easily read deeply nested
> code.

I hear you!  One of the first things I did when I started developing code
on the Amiga was to get a "detabbing print utility" (expands tabs) and
customize it for my own use.  The command line takes an option (-i<tabs>)
which allows you to specify what your tab setting is.  I've also added
Unix-style wildcarding (via Aztec's "scdir" function).  It (pr) also outputs
an optional header with filename, date, page number and the line number of
the first line on the page.  My reason for posting this rather than sending 
e-mail is to get some response/opinion to a concern related to 
"public domain" software.

Just for fun, I added a copy of C. Heath's "getfile" requester package
so that you can call my "pr" without filename parameters and it will
put up a requester (it currently only works for one file, but I could
easily add a "more?" loop to it).  Here's the rub - I modified the
"getfile" package to return a status code which informs me when the
CANCEL gadget has been clicked.  The author explicitly states in his
source that no modified version of this package is to be released without
clearing it through him (though it is on the update disks for the Aztec C
compiler and probably a thousand other BBS's, etc.).  I have so far
respected the author's wishes, but I have no desire to send a letter out
into the void, waiting for "permission" to re-release the modified source.

What I'm leading up to is this - would it be ethical for me to maintain
opening credits to C. Heath but rename the modified routine?  I've been
hesitant to do this since I don't want to step on any toes.  I could just
release the source (just another dumb little tool, mind you) with pointers
to the changes that must be made, but that's a hassle.  When I put things
in the public domain (not much for Amiga yet, much for C64), I wave
goodbye and encourage the world to do with it as they will.  Though I am
grateful to C. Heath for releasing his code in the first place, I wish he
had been less restrictive in his "conditions".  Thanks for listening.

Mark

-- 
| Mark R. Rinfret, SofTech, Inc.		mark@unisec.usi.com |
| Guest of UniSecure Systems, Inc., Newport, RI                     |
| UUCP:  {gatech|mirror|cbosgd|uiucdcs|ihnp4}!rayssd!unisec!mark    |
| work: (401)-849-4174	home: (401)-846-7639                        |

glee@cognos.UUCP (Godfrey Lee) (03/27/87)

I think all editors should support settable tabs. The tabs should be stored
as tabs in the file. I hate editors that changes what I type in, because I
always end up in the situation of not being able to produce a file of
exactly what I want. Vi does not commit that sin, aside from NULs, you
can get anything into the file.

If your printer doesn't support settable tabs, your print program should.
If your print program doesn't, use "detab" or "expand" (they are also trivial
to write).
-- 
-----------------------------------------------------------------------------
Godfrey Lee, Cognos Incorporated, 3755 Riverside Drive,
Ottawa, Ontario, CANADA  K1G 3N3
(613) 738-1440		decvax!utzoo!dciem!nrcaer!cognos!glee

vanam@pttesac.UUCP (03/28/87)

Here's my 2 cents on the tab subject.  I think Z is doing it just right
when it displays tabs at whatever setting you choose, but still stores
them internally as tab characters.  It's not the fault of Z if a particular
printer forces tabs to be every 8 characters.  It's up to the editor, the
printer driver (or printer itself) to allow the user to set tab stops
wherever she wants.

Anyhow, that's my opinion.

Marnix
-- 
Marnix (ain't unix!) A.  van\ Ammers	Work: (415) 545-8334
Home: (707) 644-9781			CEO: MAVANAMMERS:UNIX
UUCP: {ihnp4|ptsfa}!pttesac!vanam	CIS: 70027,70

cjp@vax135.UUCP (03/31/87)

In article <402@pttesac.UUCP> vanam@pttesac.UUCP (Marnix van Ammers) writes:
>them internally as tab characters.  It's not the fault of Z if a particular
>printer forces tabs to be every 8 characters.  It's up to the editor, the
>printer driver (or printer itself) to allow the user to set tab stops
>wherever she wants.

I am not flaming here, but to defend my point:

Do you propose then that I also modify type, more, and every other
editor I occasionally use, as well as the printer device, to compensate
for Z's too-simple treatment of indents?  Sir, I claim that 8-space
tabs are the standard and variable-length tabs are a frill, a kludge,
and a hack.  Vi did it right.  I also claim that there is *no* saving
in file length by using 4-space tabs as opposed to all 8-space tabs
plus occasional runs of 4 spaces, for source code indented an average
of roughly 3 times or more per line.  (Analysis available on request.)

	Charles Poirier   vax135!cjp

dillon@CORY.BERKELEY.EDU.UUCP (04/01/87)

>I think all editors should support settable tabs. The tabs should be stored
>as tabs in the file. I hate editors that changes what I type in, because I
>always end up in the situation of not being able to produce a file of
>exactly what I want. Vi does not commit that sin, aside from NULs, you
>can get anything into the file.
>
>If your printer doesn't support settable tabs, your print program should.
>If your print program doesn't, use "detab" or "expand" (they are also trivial
>to write).

	I disagree.  Editors such as VI, EMACS, ED, and DME writeout files
as normal text files, and thus there is no way for the printer driver to
know what tab size to use unless *you* tell it.. for each file.  I think
the only way one can avoid having to know what the tabsize should be for
a given files is to always use a standard tab size (I.E. 8) when reading and
writing files, and converting to whatever internal tabbing you prefer.  Then,
you could VI, EMACS, ED, or DME any arbitrary programmer's files without 
known which tab size he likes to use.

	To prevent misunderstanding, here is an example:

	person A uses a tab size of 7 inside his editor.  He has the following
	line IN THE EDITOR:

	<TAB><TAB>x

	He then writes the file to disk.  On disk, the tabs are 8, and the file
	looks like this:

	<TAB><6 spaces>x

	ANYBODY can then load that file into their own editor with their own
	personal tabbing... since in the load process the editor *knows* the
	tabs are always 8 on disk, and converts.  So person B likes tabs of 4.
	He loads person A's file and gets this:

	<TAB><TAB><TAB><2 spaces>x

	NOTE: The text file looks *exactly* the same whether you CAT it from
	disk, or EDIT it with your favorite tabbing.


	I personally use tabs of 4 in VI when I'm using UNIX systems, and I
find it a bi#$@ch to have to 'expand -4' 40 source files before sending them
to the printer.  

				-Matt

farmer@ico.UUCP (David Farmer) (04/02/87)

Summary:

Expires:

Sender:

Followup-To:

Distribution:

Keywords:


In article <8704010701.AA20834@cory.Berkeley.EDU> dillon@CORY.BERKELEY.EDU (Matt Dillon) writes:

>>I think all editors should support settable tabs. The tabs should be stored
:>as tabs in the file. I hate editors that changes what I type in, because I
:>always end up in the situation of not being able to produce a file of
:>exactly what I want. Vi does not commit that sin, aside from NULs, you
:>can get anything into the file.

Actualy it does mangle characters with the HI-BIT set.

:>
:>If your printer doesn't support settable tabs, your print program should.
:>If your print program doesn't, use "detab" or "expand" (they are also trivial
:>to write).
:
:       I disagree.  Editors such as VI, EMACS, ED, and DME writeout files
:as normal text files, and thus there is no way for the printer driver to
:know what tab size to use unless *you* tell it.. for each file.  I think
:find it a bi#$@ch to have to 'expand -4' 40 source files before sending them
:to the printer.
:
:                               -Matt

Exactly.  But I would suggest that with VI you set your shift-width (sw) to
4, and leave you tab-stops (ts) set at 8.  This way if you make use of the
auto indent, and << and >> VI will automaticly  use tabs, or 4 spaces where
appropriate.  The only drawback is when I am typing an indented line,
sometimes I hit the TAB key, and then BACKSPACE, and 4 spaces since a TAB
put me in too far.

I hope this discussion doesn't continue forever, but nobody seems to have
mentioned this so far.

David Farmer.
Disclaimers?