[comp.unix.wizards] awk arguments ... and more tricks of the awk-masters :-)

martin@mwtech.UUCP (Martin Weitzel) (07/25/90)
In article <290@sun13.scri.fsu.edu> mayne@VSSERV.SCRI.FSU.EDU (William (Bill) Mayne) writes:
>I have had a problem with the syntax of the awk command for some
>time. I quote from the man pages for awk from SunOS:
>> 
>> SYNOPSIS
>>      awk [ -f program-file ] [ -Fc  ]  [  program  ]  [  variable
>>      =value ... ] [ filename...]
>> 
[description of some common problems with awk deleted]

Let's start with a short summary how awk treats command-line arguments
(some tricks that even the more advanced of the readers might not yet
have discovered follow later ...):

1) awk gives an "="-sign in an argument precedence over an existing file.
   This may not be what you want sometimes, but it is the way awk works.
2) awk is a bit stupid in that it counts command-line assignements the
   same as regular file arguments when it has to decide if standard input
   should be read. (Like so many unix programs awk reads standard input
   only if there are *no* arguments - e.g. think of "some-command | lp"
   compared to "lp some-file another-file"). Therefore, if you want to
   have awk read standard input *and* use command-line assignment to
   variables, you must explicitly write a hyphen as argument after the
   assignment. awk treats this as synonym for "read stdin now". (You can
   also mix "-" with regular files.)
3) Variables passed to awk from the command line are *not* available
   until awk processes the *immediatly* following file argument.
   There is no direct way to make them available in the BEGIN-section,
   (what is be a pitty in some situations and turns out to be useful
   in others) but there is a possible workaround:

	awk 'init == 0 {
		..... # do initialization based on
		..... # command-line assignments
		init++
	}
	..... # more stuff
	' foo=bar .....

Command-line assignments are only *one* way to pass variables and other
stuff from outside into a awk-program. If you use the common technique
of writing your awk-program as shell script and supply the "real"
awk-program as first argument, you can still pass things into this
program if you close the sinqle quoted argument string for a moment.
Look at the following to understand a moderate complex form of it:

	awk '..............var = "'"$var"'"; ............'
	     A-------------------ABCD--DCBA-------------A

Most of the above (A-A) is quoted for the shell and hence passed
unchanged to awk. THIS INCLUDES TWO OF THE DOUBLE QOUTES - LOOK
CLOSELY! What awk hence will see is an assignement of a string
constant to var: .... var = "stuff inserted by the shell"; .....

The two inner single-quotes (B) temporarily end the qouting for the shell,
hence the shell interprets what is contained, especially it recognizes
$var (D-D) and substitutes what it currently is set to. Again: Note that
$var denotes the shell variable here - visible for awk (as part of its
program) is only what the shell substitutes for $var! 

The not yet mentioned double quotes (C) are only necessary if the
substitution of $var may yield blanks (exactly speaking: IFS-chars).
This would split the above and make two arguments for awk, which in
turn would result in a syntax error as awk considers only its first
argument as program and this becomes incomplete. (As double-quotes
around a shell variable generally do no harm, I developed the habbit
to write them in most any case, not only in the above one.)

There still remains a minor problem with the above: If substituting
$var yields a double-quote, you get a syntax error from awk. Why? Look
again what awk really receives as its first argument: an assignement of
a *string constant* to a variable and a contained double quote must be
guarded with a backslash in a string constant! On the other hand, with
awk '............' var="$var" - there is no such problem since awk
knows about the special nature of command-line assignements. (There
is also no problem for the shell with a double quote embedded in a
shell variable, as the shell looks for double quotes *before* variable
substitution occurs and not any more after that step.)

One final remark to those who really like awk (like me :-)) and find it
useful for a lot of projects. You can use what you just learned to build
some kind of a #include-feauture and have "libraries" of useful awk-program
fragments: Just use

	awk '
		..... special stuff .....
		'"`cat awklib/proc1`"'	# standard stuff
		'"`cat awklib/proc2`"'	# standard stuff
		.... other special stuff ....
		'"`cat awklib/proc3`"'	# standard stuff
		.... etc ...
	' command-line arguments as desired

Of course you must not exceed the maximum program argument size which is
around 8..10 KB for many variants of UNIX (further depending on how
much is currently in your environmental variables).

You are not at all limited to simply "cat"-ing a file into the script
here - you can do any fancy stuff, included having another call to awk
which generates the program for the first one ... but of course you
should have mastered the difficulties of complicated quoting then :-)

Some other not so obvious trick is the following:

	awk '
		........
		'"${feature:+stuff}"'
		........
	'

which expands "stuff" into your awk-script if the shell variable
$feature is not empty (consider it as some kind of #if - #endif).
But be warned: Using the above trick may not only confuse the
one who has to maintain this code later, but is also of limited use,
since "stuff" may not contain a closing curly bracket(%).

Nevertheless I think there are situations where the above techique is
useful and I show you an excerpt from one of my scripts where I use it.
As the  original script is rather complex, I'll show and explain it only
partially: In most cases this script processes some text-file completely,
but the user has the choice to specify the lines where processing should
start and end in a context sensitive manner thru regular expressions
(bpat, epat). Furthermore a special action can be triggered thru lines
of the processed text-file which contain a certain pattern. Again to
allow for maximum flexibility, I wanted to let the user specify the
desired pattern as regular expression (ppat).

On the other hand I wanted to avoid the regular expression matching
step for the (common) situation, where the user had not specified
regular expressions at all. Hence some lines in the following awk
program are only 'conditional':

Excerpt from shell script (lines not shown denoted with ...):
----------------------------------------------------------------------
...
# The following options are supported:
...
#	-b pat	start output on a line containing "pat",
#		including this line (Default: from beginning)
#	-e pat	end output on a line containing "pat"
#		excluding this line (Default: upto end)
#	-p pat	before lines containing "pat", page breaks
#		may occur (Default: no page breaks)
# "pat" may be an "extended regular expression" as supported by awk.
...
# CAVEATS:
# "pat"s are not checked before they are used (processing may have
# started, before problems are detected).
...
sk=0
for p
do
	case $sk in
	1) shift; sk=0; continue
	esac
	case $p in
	...
	-b)	shift; bpat=$1; sk=1 ;;
	-e)	shift; epat=$1; sk=1 ;;
	-p)	shift; ppat=$1; sk=1 ;;
	--)	shift; break ;;
	*)	break
	esac
done

awk '
...
# limit selection range
{
	'${epat:+' if ($0 ~ /'"$epat"'/) skip = 1; '}'
	'${bpat:+' if ($0 ~ /'"$bpat"'/) skip = 0; '}'
	if (skip) next;
}
...
# finally print this line
{
	'${ppat:+' if ($0 ~ /'"$ppat"'/) printf("%s", PBRK); '}'
	print
}
' $*
-----------------------------------------------------------------------

%: It turns out that something of the above becomes a new questions for
   you unix wizards: Does anybody know a way to quote a "}" in the
   context of a conditional expansion of a shell variable. What I'm
   looking for is the following

	echo ${A:+'{}'}    # echo '{}' only if A is not empty

   which works for non-empty A-s but fails otherwise: a single '}'
   is echoed where IMHO nothing should appear. Note that the above
   works if I use some intermediate variable: B={}; echo ${A+$B}
   Or is it a bug of the shell that the above doesn't work as
   expected?
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83