[net.unix-wizards] Awk bug+fix

kg@hplabs.UUCP (Ken Greer) (03/02/84)

The following awk bug exists on all version of awk I've looked at
which includes 4.1BSD, 4.2BSD, Sys 5 rel. 0, and Sys 5 rel. 2. 

			Ken Greer
			hplabs!kg
			kg@HPLABS (CSNET)
			kg.hplabs@RAND-RELAY (ARPA)

The awk script,

END {
	print "First" > "junk"
	print "cat junk" | "sh"
	print "Second" > "junk"
    }

demonstrates two (and a half) bugs:

1) When finished, the file junk has two lines
   in it, not one.  The second > is treated like an >>.
2) The pipe on line three is executed after awk exits,
   not at the time of the call.
2.5) As a result of 1) and 2), the file cat-ed on line three
     shows two lines, not one.

The problem in awk is:

1) A file opened (with > or >>) is not closed after the
   statement is executed.  The file name is remembered
   and subsequent references to that file name use the same
   open descriptor.  (Efficient but wrong).

2) Pipes are not closed (with pclose).

The diff for the fix is:
-----------------------

In awk/run.c (procedure redirprint):

> = new version
11,16d10
< #define FILENUM	10
< struct
< {
< 	FILE *fp;
< 	char *fname;
< } files[FILENUM];
858d851
< 	register int i;
859a853
> 	FILE *fp;
863,870d856
< 	for (i=0; i<FILENUM; i++)
< 		if (strcmp(x->sval, files[i].fname) == 0)
< 			goto doit;
< 	for (i=0; i<FILENUM; i++)
< 		if (files[i].fp == 0)
< 			break;
< 	if (i >= FILENUM)
< 		error(FATAL, "too many output files %d", i);
872c858
< 		files[i].fp = popen(x->sval, "w");
---
> 		fp = popen(x->sval, "w");
874c860
< 		files[i].fp = fopen(x->sval, "a");
---
> 		fp = fopen(x->sval, "a");
876,877c862,863
< 		files[i].fp = fopen(x->sval, "w");
< 	if (files[i].fp == NULL)
---
> 		fp = fopen(x->sval, "w");
> 	if (fp == NULL)
879,884c865,869
< 	files[i].fname = tostring(x->sval);
< doit:
< 	fprintf(files[i].fp, "%s", s);
< #ifndef gcos
< 	fflush(files[i].fp);	/* in case someone is waiting for the output */
< #endif
---
> 	fprintf(fp, "%s", s);
> 	if (a == '|')
> 		pclose(fp);
> 	else
> 		fclose(fp);


-- 
Ken Greer

david@iwu1a.UUCP (David Scheibelhut) (03/07/84)

The complaint was that the AWK statement:

	print "data" > "file"

sets the file to zero length only if the print is the first to write
to the file.  This behavior can be considered a bug or feature depending
on the point of view.  Because subsequent prints append to the file one
need not include special code to initialize the file at the start of
execution.  Instead, by using ">" on all print statements, the file is
initialized without special code and contains all output from the last run,
the most common need in AWK's transaction-oriented environment.

On the other hand these semantics are different those of the shell
(where AWK acquired ">") and do not allow AWK to clear a file.  Which
is best?  AWK documentation is vague and allows both interpretations.
Thus the proposed change to AWK is not a bug fix but a change to a working
tool.  Because this change will break many existing AWK programs (including
many of mine) and because AWK has been quite static (the latest documentation
being written about six years ago) I urge that this "bug" not be fixed.

Similar arguments can be made about the other "bugs" which were discussed:
the current implementation has advantages and the documentation, although
ambiguous, allows the current implementation.

	David Scheibelhut