[comp.unix.questions] An awk question or two...

daniels@well.UUCP (Dan Smith, Social Mammal...) (09/12/87)

	Hello, I've been using awk for a few weeks, read through the
documentation, the "Supplemental Document for awk" by John Pierce (a
good paper to have!), and the chapter in "Unix Papers" called
"awk Power Plays" by David Huelsbeck. Still, I have a few problems that
I haven't figured out, and the documentation doesn't seem to
have anything appropriate...

	Command lines, passing variables to awk:

	awk -f comline.awk comvar=\"SUB\" ascii.h

	this is the file comline.awk:
	
	comvar	{ print }

	"ascii.h" has a couple of lines that have "SUB"

	and, the result:

323 bin/src :-} !aw
awk -f comline.awk comvar=\"SUB\" ascii.h
awk: syntax error near line 1
awk: bailing out near line 1
324 bin/src :-}

	Now, you might look at this and say, "why dont you use grep"?

	I'm not done yet, it gets more fun...

	What I want to do is to pass *two* command line variables, and
have awk work on the lines within the two patterns -- such as:

	comlinevar1, comlinevar2	{ (some awk commands) }

	Here's where another problem springs up. The file I'll be awk'ing
a lot has a lot of info that pertains to programs... a sample format from
the file looks like:

-c start of data...
-ds data
-f bhuff.bin
-n
-f bhuff.c
-f dirinfo
-c basename only files for data...
-b dirinfo
-b str_name
-c filename extensions for data...
-e bin
-e old
-c end of the directory data...
-de data
-c start of src...ame> line, and pick out lines that start with "-f" (files)
in that directory - just that directory... I was thinking that I could
do something like: (comlinevar1="-ds", comlinevar2="-de")

	comlinevar1, comlinevar2 {
		/-f/	{ print $2 }
	}

	Obviously, this violates a some things about awk. All of the
examples that I saw of awk didn't seem to address this sort of proceessing.
Maybe sed is better for this. The basic idea is: get a range of lines
from a file, and print selected ones from that range. I want to be
able to use command line arguments, so that I dont have to write
20-30 scripts that all do pretty much the exact same thing. I'm
probably missing something really obvious in the documentation -
I dont have a lot of examples to learn from; even if you haven't
solved this particular sort of problem, and you've written some
stuff in awk that you dont mind mailing, I would love to get a copy.
I certainly hope there is a good way to do this in awk - it's
certainly pretty useful for other types of text processing
problems, and this problem doesn't seem like one that would hit
a limitation in awk.

	thanks much for any light you can shed on this!

			dan

dan smith, island graphics, marin co., ca  | "I am responsible for everything
uucp: ..!ucbvax!ucbcad!well!island!daniel  |  I've ever said since 1960!"
uucp: ..!ptsfa!unicom!daniel !well!daniels |  (415) 892 TANK (h) 491 1000 (w)

guy@sun.UUCP (09/12/87)

> 	awk -f comline.awk comvar=\"SUB\" ascii.h
> 
> 	this is the file comline.awk:
> 	
> 	comvar	{ print }
> 
> 	and, the result:
> 
> 323 bin/src :-} !aw
> awk -f comline.awk comvar=\"SUB\" ascii.h
> awk: syntax error near line 1
> awk: bailing out near line 1
> 324 bin/src :-}

"comvar" is not a legal pattern.  A pattern is either a keyword (such as BEGIN
or END), a relational expression, or a regular expression.  You can't just
throw a variable name there and expect "awk" to treat that as a regular
expression that matches all lines that contain that regular expression; "awk"
doesn't permit that.

What you want to do instead is to pass the program to "awk" *on the command
line*, and put it in double quotes so that shell variables are expanded before
the string is passed to "awk".  E.g., in a shell script, you could have the
command:

	awk "/$1/ { print }" ...

which would tell "awk" to print whatever lines were matched by the pattern
specified by the first argument to that shell script.
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

dph@beta.UUCP (David P Huelsbeck) (09/14/87)

In article <27817@sun.uucp> guy@sun.uucp (Guy Harris) writes:
>In article <3931@well.UUCP> daniels@well.UUCP (Dan Smith, Social Mammal...) 
> writes:
>> 	awk -f comline.awk comvar=\"SUB\" ascii.h
>> 
>> 	this is the file comline.awk:
>> 	
>> 	comvar	{ print }
>> 
>> 	and, the result:
>> 
 [ usual wonderfully helpful awk error messages deleted ]
>
>"comvar" is not a legal pattern.  A pattern is either a keyword (such as BEGIN
>or END), a relational expression, or a regular expression.  You can't just
>throw a variable name there and expect "awk" to treat that as a regular
>expression that matches all lines that contain that regular expression; "awk"
>doesn't permit that.

TRUE.

>What you want to do instead is to pass the program to "awk" *on the command
>line*, and put it in double quotes so that shell variables are expanded before
>the string is passed to "awk".  

Maybe, maybe not.

>	Guy Harris

Guy is correct. But if you'll look at my  spreadsheet calculator example
you'll see that in this specific case there is a way to do what you want
to do. I haven't yet figured out a way to do this using real regexs and
pattern matching yet but I'll bet if you use a sufficiently twisted method
you could do it somehow. After all that's the fun of awk. ;-)

Try this:

	#comline.awk (script to do something)
	
	$1 == arg1 , $1 == arg2 {			
		
		if ($1 ~ /-f/) print $2

	}

Then awk -f comline.awk arg1="-ds" arg2="-de" ascii.h

and you get:

bhuff.bin
bhuff.c
dirinfo

The important thing to note here is that whenever you use ~ or !~ one
of the operands must be a true pattern i.e. /regex/
This is too bad but I think it is a result of the fact that awk "compiles"
regexs during the script compilation phase. I suppose it does this for
reasons of speed but I'm not sure enough to say. So while putting $1 == xyz
in the pattern field is OK $1 ~ xyz will cause awk to barf. This is true
everywhere. So "if ($1 ~ "-f") print $2 " will net you a syntax error also.

This is the best AWK solution I could come up with. If you need something
more using the shell variables is the best way I can think of just now.
If you can figure out a way around this limitation without using the shell
I'd like to see it.

	David Huelsbeck
	dph@LANL.GOV
	{cmcl2,ihnp4}!lanl!dph

#include <standard_disclaimers.h>


	

breck@aimt.UUCP (Robert Breckinridge Beatie) (09/15/87)

In article <27817@sun.uucp>, guy@sun.uucp (Guy Harris) writes:
> > 	awk -f comline.awk comvar=\"SUB\" ascii.h
> > 
> > 	this is the file comline.awk:
> > 	
> > 	comvar	{ print }
> > 
> 
> "comvar" is not a legal pattern.  A pattern is either a keyword (such as BEGIN
> or END), a relational expression, or a regular expression.  You can't just
> throw a variable name there and expect "awk" to treat that as a regular
> expression that matches all lines that contain that regular expression; "awk"
> doesn't permit that.
> 

Actually, according to: "Awk - A Pattern Scanning and Processing Language"
by Aho Kernighan and Weinberger (Second Edition), "A variety of expressions
may be used as patterns: regular expressions, arithmetic relational expressions,
*string-valued expressions*, and arbitrary boolean combinations of these."  Now
I haven't been able to make use of variables as "string-valued expressions" in
the pattern part of an awk statement, but I haven't been able to find anything
that says they can't be.  In fact in the action part of an awk statement, a
variable is a valid string-valued expression, so shouldn't it also be valid in
the pattern part of the statement?

I think the question boils down to, "How can I force awk to use the value of
the variable: comvar instead of the string that is the variable's name?"  Or
is my interpretation of documentation flawed?  God knows I've been bitten by
my too-liberal interpretation of documentation before.

-- 
Breck Beatie
uunet!aimt!breck

guy@sun.uucp (Guy Harris) (09/15/87)

> Actually, according to: "Awk - A Pattern Scanning and Processing Language"
> by Aho Kernighan and Weinberger (Second Edition), "A variety of expressions
> may be used as patterns: regular expressions, arithmetic relational
> expressions, *string-valued expressions*, and arbitrary boolean
> combinations of these."  Now I haven't been able to make use of variables
> as "string-valued expressions" in the pattern part of an awk statement,
> but I haven't been able to find anything that says they can't be.

In the "awk" document in the S5R3 Programmer's Guide (I've found the S5 "awk"
documentation to be much nicer than the old documentation, and the S5R3 "awk"
isn't much changed from the "awk" that went out on the V7 addendum tape, which
is basically the same one that's in 4BSD), the list above is missing the
phrase "string-valued expressions".  Sounds like the documentation you cite
either 1) described a version of "awk" other than the one most people have or
2) has an error in it.

> I think the question boils down to, "How can I force awk to use the value of
> the variable: comvar instead of the string that is the variable's name?"

The answer is "you can't; sorry, the documentation lied".
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

dph@beta.UUCP (David P Huelsbeck) (09/16/87)

In article <90@aimt.UUCP> breck@aimt.UUCP (Robert Breckinridge Beatie) writes:
>In article <27817@sun.uucp>, guy@sun.uucp (Guy Harris) writes:
>> > 	awk -f comline.awk comvar=\"SUB\" ascii.h
>> > 
>> > 	this is the file comline.awk:
>> > 	
>> > 	comvar	{ print }
>> > 
>> 
>> "comvar" is not a legal pattern.  A pattern is either a keyword 
>> (such as BEGIN or END), a relational expression, or a regular expression. 

 [....]

>Actually, according to: "Awk - A Pattern Scanning and Processing Language"
>by Aho Kernighan and Weinberger (Second Edition), "A variety of expressions
>may be used as patterns: regular expressions, arithmetic relational 
>expressions,
>*string-valued expressions*, and arbitrary boolean combinations of these."  

[...]

>  Or
>is my interpretation of documentation flawed?  God knows I've been bitten by
>my too-liberal interpretation of documentation before.
>
>-- 
>Breck Beatie
>uunet!aimt!breck

No. (at least I don't think so)

The documentation does say that.

However, if you read the abstract page you'll find:

"*Awk* patterns may include arbitrary boolean combinations of ..."
                                      ^^^^^^^^^^^^^^^^^^^^

If you think about it a while this makes sense. When each record is read
awk will run down the list of pattern-action pairs, look at each pattern
and then either do the action or not do the action. It does it or it doesn't,
so it's boolean. Or at least it needs to make a boolean type of decision.
What is misleading is the fact that awk allows patterns to be a simple
regex in slashes or non-existant in addition to the clearly boolean valued
patterns like "a == b". But if you think of the non-existant or default
pattern as a shorthand for TRUE or 1 == 1 or whatever, and the /regex/
pattern as "$0 ~ /regex/" then it is clear why a variable or string valued
expression will not work. It's really not a "syntax error" as awk claims
but rather a semantic error of the "type mismatch" variety. 

What is needed to make awk behave the way we've been talking about is a
new built-in function like:

	match(str,expr)

which is boolean valued, where "str" may be any string and "expr" may be
any string which is itself a valid regex. (NOTE: I've never found occation
to use a function in a pattern but from the lex source it looks like it
ought to work.) 

The problem is you can't define new functions in standard awk. They're 
built in at the lowest level just like + and - and all the rest. I have
been told there is a new awk out with subroutines/function calls and the
like. I haven't seen it. I'd like to but I'd be more inclined to just
rewrite awk the way I'd like it than to pay for a new version. There is
a program called "bawk" which is similar to awk but source is available
for free. I've looked at it but never used it so I can't comment further.

So the way to make awk work the way you'd like is to in some fashion
rewrite it. Or just use the shell as Guy suggested. The true power of
UNIX is using your tools in harmony. 

The stream-of-bytes-in-stream-of-bytes-out paradigm is what 
makes UNIX UNIX.

Use it.

	David Huelsbeck
	dph@lanl.gov
	{cmcl2,ihnp4}!lanl!dph

Sorry for going on so. Say, when will we be seeing comp.awk.questions?

#include <standard_disclaimers>