martin@mwtech.UUCP (Martin Weitzel) (07/25/90)
In article <290@sun13.scri.fsu.edu> mayne@VSSERV.SCRI.FSU.EDU (William (Bill) Mayne) writes: >I have had a problem with the syntax of the awk command for some >time. I quote from the man pages for awk from SunOS: >> >> SYNOPSIS >> awk [ -f program-file ] [ -Fc ] [ program ] [ variable >> =value ... ] [ filename...] >> [description of some common problems with awk deleted] Let's start with a short summary how awk treats command-line arguments (some tricks that even the more advanced of the readers might not yet have discovered follow later ...): 1) awk gives an "="-sign in an argument precedence over an existing file. This may not be what you want sometimes, but it is the way awk works. 2) awk is a bit stupid in that it counts command-line assignements the same as regular file arguments when it has to decide if standard input should be read. (Like so many unix programs awk reads standard input only if there are *no* arguments - e.g. think of "some-command | lp" compared to "lp some-file another-file"). Therefore, if you want to have awk read standard input *and* use command-line assignment to variables, you must explicitly write a hyphen as argument after the assignment. awk treats this as synonym for "read stdin now". (You can also mix "-" with regular files.) 3) Variables passed to awk from the command line are *not* available until awk processes the *immediatly* following file argument. There is no direct way to make them available in the BEGIN-section, (what is be a pitty in some situations and turns out to be useful in others) but there is a possible workaround: awk 'init == 0 { ..... # do initialization based on ..... # command-line assignments init++ } ..... # more stuff ' foo=bar ..... Command-line assignments are only *one* way to pass variables and other stuff from outside into a awk-program. If you use the common technique of writing your awk-program as shell script and supply the "real" awk-program as first argument, you can still pass things into this program if you close the sinqle quoted argument string for a moment. Look at the following to understand a moderate complex form of it: awk '..............var = "'"$var"'"; ............' A-------------------ABCD--DCBA-------------A Most of the above (A-A) is quoted for the shell and hence passed unchanged to awk. THIS INCLUDES TWO OF THE DOUBLE QOUTES - LOOK CLOSELY! What awk hence will see is an assignement of a string constant to var: .... var = "stuff inserted by the shell"; ..... The two inner single-quotes (B) temporarily end the qouting for the shell, hence the shell interprets what is contained, especially it recognizes $var (D-D) and substitutes what it currently is set to. Again: Note that $var denotes the shell variable here - visible for awk (as part of its program) is only what the shell substitutes for $var! The not yet mentioned double quotes (C) are only necessary if the substitution of $var may yield blanks (exactly speaking: IFS-chars). This would split the above and make two arguments for awk, which in turn would result in a syntax error as awk considers only its first argument as program and this becomes incomplete. (As double-quotes around a shell variable generally do no harm, I developed the habbit to write them in most any case, not only in the above one.) There still remains a minor problem with the above: If substituting $var yields a double-quote, you get a syntax error from awk. Why? Look again what awk really receives as its first argument: an assignement of a *string constant* to a variable and a contained double quote must be guarded with a backslash in a string constant! On the other hand, with awk '............' var="$var" - there is no such problem since awk knows about the special nature of command-line assignements. (There is also no problem for the shell with a double quote embedded in a shell variable, as the shell looks for double quotes *before* variable substitution occurs and not any more after that step.) One final remark to those who really like awk (like me :-)) and find it useful for a lot of projects. You can use what you just learned to build some kind of a #include-feauture and have "libraries" of useful awk-program fragments: Just use awk ' ..... special stuff ..... '"`cat awklib/proc1`"' # standard stuff '"`cat awklib/proc2`"' # standard stuff .... other special stuff .... '"`cat awklib/proc3`"' # standard stuff .... etc ... ' command-line arguments as desired Of course you must not exceed the maximum program argument size which is around 8..10 KB for many variants of UNIX (further depending on how much is currently in your environmental variables). You are not at all limited to simply "cat"-ing a file into the script here - you can do any fancy stuff, included having another call to awk which generates the program for the first one ... but of course you should have mastered the difficulties of complicated quoting then :-) Some other not so obvious trick is the following: awk ' ........ '"${feature:+stuff}"' ........ ' which expands "stuff" into your awk-script if the shell variable $feature is not empty (consider it as some kind of #if - #endif). But be warned: Using the above trick may not only confuse the one who has to maintain this code later, but is also of limited use, since "stuff" may not contain a closing curly bracket(%). Nevertheless I think there are situations where the above techique is useful and I show you an excerpt from one of my scripts where I use it. As the original script is rather complex, I'll show and explain it only partially: In most cases this script processes some text-file completely, but the user has the choice to specify the lines where processing should start and end in a context sensitive manner thru regular expressions (bpat, epat). Furthermore a special action can be triggered thru lines of the processed text-file which contain a certain pattern. Again to allow for maximum flexibility, I wanted to let the user specify the desired pattern as regular expression (ppat). On the other hand I wanted to avoid the regular expression matching step for the (common) situation, where the user had not specified regular expressions at all. Hence some lines in the following awk program are only 'conditional': Excerpt from shell script (lines not shown denoted with ...): ---------------------------------------------------------------------- ... # The following options are supported: ... # -b pat start output on a line containing "pat", # including this line (Default: from beginning) # -e pat end output on a line containing "pat" # excluding this line (Default: upto end) # -p pat before lines containing "pat", page breaks # may occur (Default: no page breaks) # "pat" may be an "extended regular expression" as supported by awk. ... # CAVEATS: # "pat"s are not checked before they are used (processing may have # started, before problems are detected). ... sk=0 for p do case $sk in 1) shift; sk=0; continue esac case $p in ... -b) shift; bpat=$1; sk=1 ;; -e) shift; epat=$1; sk=1 ;; -p) shift; ppat=$1; sk=1 ;; --) shift; break ;; *) break esac done awk ' ... # limit selection range { '${epat:+' if ($0 ~ /'"$epat"'/) skip = 1; '}' '${bpat:+' if ($0 ~ /'"$bpat"'/) skip = 0; '}' if (skip) next; } ... # finally print this line { '${ppat:+' if ($0 ~ /'"$ppat"'/) printf("%s", PBRK); '}' print } ' $* ----------------------------------------------------------------------- %: It turns out that something of the above becomes a new questions for you unix wizards: Does anybody know a way to quote a "}" in the context of a conditional expansion of a shell variable. What I'm looking for is the following echo ${A:+'{}'} # echo '{}' only if A is not empty which works for non-empty A-s but fails otherwise: a single '}' is echoed where IMHO nothing should appear. Note that the above works if I use some intermediate variable: B={}; echo ${A+$B} Or is it a bug of the shell that the above doesn't work as expected? -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83