[comp.unix.wizards] serious awk bug

tps@chem.ucsd.edu (Tom Stockfisch) (03/14/90)

The following awk script doesn't behave properly:

#! /bin/sh

awk '/^a*[^b]/	{  print "1:", $0 }
/^a*b/ 	{ print "2:", $0 }
'

When given the following input


b
ab


It produces the following output


2: b
1: ab
2: ab


Basically, the line "ab" should match only rule 2, but it matches both
rules.  The following script:


#! /bin/sh

awk '/^a*[^b]c/	{  print "1:", $0 }
/^a*bc/ 	{ print "2:", $0 }
'


works, producing the output

2: bc
2: abc


The corresponding lex program works fine.

I have run the awk script with both the new awk and old awk, on both
a system V machine (silicon graphics iris) and a 4BSD machine (celerity)
and all seem to fail.  
-- 

|| Tom Stockfisch, UCSD Chemistry	tps@chem.ucsd.edu

m5@lynx.uucp (Mike McNally) (03/14/90)

tps@chem.ucsd.edu (Tom Stockfisch) writes:

>The following awk script doesn't behave properly:

>#! /bin/sh

>awk '/^a*[^b]/ {  print "1:", $0 }
>/^a*b/     { print "2:", $0 }
>'

>When given the following input

>b
>ab

>It produces the following output

>2: b
>1: ab
>2: ab

>Basically, the line "ab" should match only rule 2 . . .


I disagree:

    a*[^b] => <null>[^b] => <null>a

The a* is matching the empty string, and the [^b] is matching the a.


















-- 
Mike McNally                                    Lynx Real-Time Systems
uucp: {voder,athsys}!lynx!m5                    phone: 408 370 2233

            Where equal mind and contest equal, go.

magnus@rhi.hi.is (Magnus Gislason) (03/15/90)

tps@chem.ucsd.edu (Tom Stockfisch) writes:
>The following awk script doesn't behave properly:
>#! /bin/sh
>awk '/^a*[^b]/	{  print "1:", $0 }
>/^a*b/ 	{ print "2:", $0 }
>'
[stuff deleted]
>Basically, the line "ab" should match only rule 2, but it matches both
>rules.  The following script:

In Regular Expressions `a*' meens 0 or more occurrencies of `a' and
`[^b]' meens any character except `b'.  In this case `a*[^b]' matches
0 `a's followed by a character that is not `b' (in this case `a').

I tried this with grep, egrep, sed and vi and they all behaved like awk
so I suppose it's the correct way.

merlyn@iwarp.intel.com (Randal Schwartz) (03/15/90)

In article <702@chem.ucsd.EDU>, tps@chem (Tom Stockfisch) writes:
| 
| The following awk script doesn't behave properly:
| 
| #! /bin/sh
| 
| awk '/^a*[^b]/	{  print "1:", $0 }
| /^a*b/ 	{ print "2:", $0 }
| '
| 
| When given the following input
| 
| b
| ab
| 
| It produces the following output
| 
| 2: b
| 1: ab
| 2: ab
| 
| Basically, the line "ab" should match only rule 2, but it matches both
| rules.

[This doesn't belong in WIZARDS.  Sorry.]

But, it *does* match rule 1!  Look carefully.  If you take zero 'a's,
and one 'not b', you can get line "ab"!

In Perl:

perl -ne 'print "1: $_" if /^a*[^b]/; print "2: $_" if /^a*b/;' <<EOF
b
ab
EOF

produces:

2: b
1: ab
2: ab

Just like awk.  Amazing.

No problem.

Just another Perl and awk hacker,
-- 
/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\
| on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III      |
| merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn |
\=Cute Quote: "Welcome to Portland, Oregon, home of the California Raisins!"=/