[net.unix-wizards] awk vs. regular expressions starting with equal sign

henry@utzoo.UUCP (Henry Spencer) (12/12/84)

There is a fundamental lexical ambiguity in awk:  when you see "/=",
is this the divide-by-and-assign operator, or the start of a regular
expression which happens to begin with an equal sign?  Awk thinks it
is the operator, which means you can't start a regular expression with
an equal sign, ever.  To really write such a pattern, you have to resort
to schemes like "/.=/" or "/.*=/".  How annoying.  I can see no real fix.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

crl@pur-phy.UUCP (Charles LaBrec) (12/13/84)

Try /\=/.  It worked for me.

Charles LaBrec
UUCP:		pur-ee!Physics:crl, purdue!Physics:crl
INTERNET:	crl @ pur-phy.UUCP

jonab@sdcrdcf.UUCP (Jonathan Biggar) (12/13/84)

In article <4770@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
>There is a fundamental lexical ambiguity in awk:  when you see "/=",
>is this the divide-by-and-assign operator, or the start of a regular
>expression which happens to begin with an equal sign?  Awk thinks it
>is the operator, which means you can't start a regular expression with
>an equal sign, ever.  To really write such a pattern, you have to resort
>to schemes like "/.=/" or "/.*=/".  How annoying.  I can see no real fix.

You should use "/[=]/", it is better.

Jon Biggar
{allegra,burdvax,cbosgd,hplabs,ihnp4,sdccsu3}!sdcrdcf!jonab

bobr@zeus.UUCP (Robert Reed) (12/15/84)

> There is a fundamental lexical ambiguity in awk:  when you see "/=",
> is this the divide-by-and-assign operator, or the start of a regular
> expression which happens to begin with an equal sign? 
>
> 				Henry Spencer @ U of Toronto Zoology
> 				{allegra,ihnp4,linus,decvax}!utzoo!henry

You can easily get around it by escaping it, such as

	awk '/\= / {...}' ...

I tried this on our 4.2BSD system and it seems to work just fine.

-- 
Robert Reed, Logic Design Systems Division, tektronix!teklds!bobr

henry@utzoo.UUCP (Henry Spencer) (12/16/84)

> Try /\=/.  It worked for me.

About six people have told me this, or variants of this.  I *thought*
it was clear from my original posting that I *know* how to work around
the problem.  My point was the absence of a *fix*, not the absence of a
workaround kludge.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

bobr@zeus.UUCP (12/17/84)

> > Try /\=/.  It worked for me.
> 
> About six people have told me this, or variants of this.  I *thought*
> it was clear from my original posting that I *know* how to work around
> the problem.  My point was the absence of a *fix*, not the absence of a
> workaround kludge.
> -- 
> 				Henry Spencer @ U of Toronto Zoology
> 				{allegra,ihnp4,linus,decvax}!utzoo!henry

    1.  It was not clear from your orginal posting that you knew that there
    	was a work-around, or I would not have responded in a manner such as
	above.  I did think it strange coming from you, because your previous
	postings have given me an impression of competence and
	thoughtfulness.

    2.  Your quest for a fix implies that there is a problem.  That I'm not
    	so sure about.  Both C expression syntax and regular expression
	syntax are consistent in isolation from each other, and the conflict
	arises only in their concerted application.  The ambiguity is a 
	syntactic one that would require an irregularity in either regular
	expressions or arithmetic expressions to resolve it.  Better that
	they should remain pure, and the conflicts of their mutual
	application be documented rather than coming up with yet another
	mechanism, one more exception that must be remembered.
-- 
Robert Reed, Logic Design Systems Division, tektronix!teklds!bobr

henry@utzoo.UUCP (Henry Spencer) (12/20/84)

>     1.  It was not clear from your orginal posting that you knew that there
>     	was a work-around...

If you re-read my original posting, you'll see explicit mention of using
things like /.*=/ as workarounds.  Not as nice as /\=/ or /[=]/, but
definitely the same sort of animal.

>    2.  Your quest for a fix implies that there is a problem.  That I'm not
>   	so sure about.  Both C expression syntax and regular expression
>	syntax are consistent in isolation from each other, and the conflict
>	arises only in their concerted application.  The ambiguity is a 
>	syntactic one that would require an irregularity in either regular
>	expressions or arithmetic expressions to resolve it.  Better that
>	they should remain pure, and the conflicts of their mutual
>	application be documented rather than coming up with yet another
>	mechanism, one more exception that must be remembered.

Sure there's a problem.  A perfectly legitimate, valid regular expression
won't work.  (It does not help that awk gives an extremely cryptic message,
and that the problem is not documented anywhere.)  There is no fundamental
ambiguity here, because the two occur in different contexts.  Note that awk
does *not* get confused about whether "/" is a division operator or the
starting delimiter of a regular expression.  It really ought to be doing
the same for "/=".  Upon close examination of the awk source, I suspect
that this is not too hard.  I may have a try at a fix; stay tuned.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry