[comp.std.unix] awk syntax

peter@ficc.ferranti.com (Peter da Silva) (05/14/91)

Submitted-by: peter@ficc.ferranti.com (Peter da Silva)

In article <1991May11.224436.25175@uunet.uu.net> arnold@audiofax.com writes:
> (How do I know that        awk 'BEGIN { print "hi" } ; END { print "bye" }'
> is legal while             awk 'BEGIN { print "hi" }   END { print "bye" }'
> isn't?  Presenting a grammar for the language is almost a necessity...)

It isn't? I use the latter all the time!
-- 
Peter da Silva; Ferranti International Controls Corporation; +1 713 274 5180;
Sugar Land, TX  77487-5012;         `-_-' "Have you hugged your wolf, today?"


Volume-Number: Volume 23, Number 66

arnold%audiofax.com@mathcs.emory.edu (Arnold Robbins) (05/14/91)

Submitted-by: arnold%audiofax.com@mathcs.emory.edu (Arnold Robbins)

>In article <1991May11.224436.25175@uunet.uu.net> arnold@audiofax.com writes:
>> (How do I know that        awk 'BEGIN { print "hi" } ; END { print "bye" }'
>> is legal while             awk 'BEGIN { print "hi" }   END { print "bye" }'
>> isn't?  Presenting a grammar for the language is almost a necessity...)

In article <1991May13.222855.9433@uunet.uu.net> peter@ficc.ferranti.com (Peter da Silva) writes:
>It isn't? I use the latter all the time!

History lesson time.   First of all, posix awk is "new" awk, not old awk.
It is based on the awk in the 1988 book by Aho, Weinberger and Kernighan.
It has some additional features that have gone in to both att & gnu awk.

One of the things that happened when new awk was first realeased was a lot
of cleaning up and consistencizing (if I may coin a term) of the awk language.
In particular, rules had to be seperated by either a newline or a semi-colon,
just like the statements inside an action.  Here's a real live example
from my V.3.2 system:

	Script started on Tue May 14 12:25:28 1991
	audiofax1> rlogin tiktok
	Password:
	
	ESIX System 5.3.2 Rev.D
	Copyright (C) 1984, 1986, 1987, 1988 AT&T
	Copyright (C) 1987, 1988 Microsoft Corp.
	Copyright (C) 1988, 1989, 1990 Everex Systems, Inc.
	All Rights Reserved
	Login last used: Tue May 14 12:24:29 1991
	TERM=at386
	tiktok> nawk 'BEGIN { print "hi" } ; END { print "bye" }' /dev/null
	hi
	bye
	tiktok> nawk 'BEGIN { print "hi" }  END { print "bye" }' /dev/null 
	nawk: syntax error at source line 1
	 context is
	        BEGIN { print "hi" }  >>>  END <<<  { print "bye" }
	nawk: bailing out at source line 1
	tiktok> 
	Connection closed.
	audiofax1> 
	script done on Tue May 14 12:26:28 1991

Based on a cursory reading of the grammar in the posix spec, this rule
applies.

Alas, some time back, backwards compatibility reared it's ugly head within
AT&T, and for V.4 nawk, Brian Kernighan "fixed" things so that the seperator
is no longer necessary.  (This was at the request of the System V folks.)
David Trueman went ahead and fixed gawk to be the same way (adding heavily
to the number of shift/reduce conflicts in the grammar).

So, the upshot is that technically, leaving out the semi-colon or newline
is not legal, but most likely you can get away with it.
-- 
Arnold Robbins				 AudioFAX, Inc. | Threads are the
2000 Powers Ferry Road, Suite 200 / Marietta, GA. 30067 | lack of an idea.
INTERNET: arnold@audiofax.com  Phone:   +1 404 618 4281 |     -- Rob Pike
UUCP:	  emory!audfax!arnold  Fax-box: +1 404 618 4581 |

[ I think this discussion is getting more towards the realm of 
  comp.unix.questions.  I'm keeping this part of it here because it is
   related to *nix standards and a good example of how fun they are. -- mod ]
Volume-Number: Volume 23, Number 68

henry@zoo.toronto.edu (Henry Spencer) (05/14/91)

Submitted-by: henry@zoo.toronto.edu (Henry Spencer)

In article <1991May13.222855.9433@uunet.uu.net> peter@ficc.ferranti.com (Peter da Silva) writes:
>>  awk 'BEGIN { print "hi" }   END { print "bye" }'   [not legal?]
>
>It isn't? I use the latter all the time!

You obviously missed that sparkling gem of technical presentation, "Awk
as a serious systems programming language", my paper at the last Usenix. :-)
I raised this specific issue as a needless incompatibility between different
awks.  The problem is that awk has never been specified precisely enough to
definitively say that this was not legal, and it worked in a lot of the early
awks, so people got used to it.  At least one more recent interpretation,
reading the rather fuzzy documentation narrowmindedly, has outlawed it.
Alas, it sounds like POSIX is legitimizing this mistake, thereby breaking
quite a bit of existing practice.
-- 
And the bean-counter replied,           | Henry Spencer @ U of Toronto Zoology
"beans are more important".             |  henry@zoo.toronto.edu  utzoo!henry


Volume-Number: Volume 23, Number 69

peter@ficc.ferranti.com (Peter da Silva) (05/15/91)

Submitted-by: peter@ficc.ferranti.com (Peter da Silva)

In article <1991May14.185737.15746@uunet.uu.net> arnold@audiofax.com writes:
> One of the things that happened when new awk was first realeased was a lot
> of cleaning up and consistencizing (if I may coin a term) of the awk language.

I don't see how that makes things any more consistent. If you look at the
grammer there's no ambiguity that needs to be resolved by adding that
semicolon. Does anyone have an idea what the reasoning behind this was?
To me, it adds confusion by treating a block as a statement.

Oh, and my V.3.2 system has no problem with that:

% ls -l | awk 'NF==9 { h[$3] += $5 } END {for(i in h) print i,h[i]}'
root 7985
peter 731662

(from a script I have lying around)
-- 
Peter da Silva; Ferranti International Controls Corporation; +1 713 274 5180;
Sugar Land, TX  77487-5012;         `-_-' "Have you debugged your wolf, today?"


Volume-Number: Volume 23, Number 71

arnold%audiofax.com@mathcs.emory.edu (Arnold Robbins) (05/16/91)

Submitted-by: arnold%audiofax.com@mathcs.emory.edu (Arnold Robbins)

>In article <1991May14.185737.15746@uunet.uu.net> arnold@audiofax.com writes:
>> One of the things that happened when new awk was first realeased was a lot
>> of cleaning up and consistencizing (if I may coin a term) of the awk 
>> language.

In article <1991May15.165824.6896@uunet.uu.net> peter@ficc.ferranti.com (Peter da Silva) writes:
>I don't see how that makes things any more consistent. If you look at the
>grammer there's no ambiguity that needs to be resolved by adding that
>semicolon. Does anyone have an idea what the reasoning behind this was?
>To me, it adds confusion by treating a block as a statement.

This is getting off the topic of standards, but what the heck.  You
ommitted my rationalization of the consistency.  To rephrase: statements
at the rule level should be consistent with statements inside an action.
Statements in a action are separated by newline or semi-colon, therefore
rules (patterns plus actions) should also be separated by newlines or
semi-colons.  It is illegal to type

	{ i = 1  j = 2 }

in an action without the semicolon or newline between the assignments.
Therefore it "should" be illegal to type rules without the separator.
(So yes, block are statements.  This makes sense, since they're executed
in the order they occur in the program.)

As I also mentioned, modern implemenations of 'nawk' (V.4 nawk, gawk)
accept rules with or without the semicolon, so it doesn't really matter.
(Many C compilers continue to accept `i =+ 1' but that doesn't make it
good programming practice...)

>Oh, and my V.3.2 system has no problem with that:
>
>% ls -l | awk 'NF==9 { h[$3] += $5 } END {for(i in h) print i,h[i]}'
>root 7985
>peter 731662

You typed "awk", no 'n'.  My example used "nawk", with an 'n'.    Try

	awk 'BEGIN { foo() }
		function foo () { print "hi" }'

on your V.3.2 system and watch "awk" (no 'n') barf all over your screen.
We're talking two very different animals here.

For whatever it's worth, the V.3.2 nawk man page said that in the "next
major release" nawk would become 'awk' and old awk would become 'oawk'.
This doesn't seem to have happened in V.4.  It probably never will in
System V.  4.4BSD will most likely ship gawk for it's version of awk.

Next topic, please?
-- 
Arnold Robbins				 AudioFAX, Inc. | Threads are the
2000 Powers Ferry Road, Suite 200 / Marietta, GA. 30067 | lack of an idea.
INTERNET: arnold@audiofax.com  Phone:   +1 404 618 4281 |     -- Rob Pike
UUCP:	  emory!audfax!arnold  Fax-box: +1 404 618 4581 |

[ He's right.  I've cross-posted this to comp.unix.questions, with
  followup's directed there. -- mod ]

Volume-Number: Volume 23, Number 72