[comp.unix.questions] Strange behaviour of awk

wsinkees@eutrc3.UUCP (Kees Huizing) (02/24/89)

In the use of the UNIX filter "awk" I encountered the following problem.
I want to save a line of input except the first field into a variable
for later use.  So I empty the first field and then assign the modified
record to the variable.
But how curious!  When I try the test case
	{$1 = "" ; x = $0 ; print x}
on the input
	Tom Dick Harry
I just get :  
	Tom Dick Harry
Testing the value of $0 by printing it
	{$1 = "" ; print}
yields the expected output
	 Dick Harry
So my solution is:
	{$1 = "" ; print >"/dev/null" ; x = $0 ; print x}
And this works!, yielding
	 Dick Harry

Now I have two questions:
1. How can the value of $0 depend on whether it has been printed or not?
   Is this a bug, or do I overlook some mechanism of awk?
2. Is there a direct way to get $2, $3, .... until the end of the line (record)
   without the somewhat dirty change of $1?  This was my original problem.

P.S. We have Ultrix 2.2 (appr. Unix BSD); I don't know how "new" our awk is.

Kees Huizing              wsinkees@eutrc3.UUCP -or- wsdckees@heitue5.BITNET
Dept. of Math. and Comp. Sc. - Eindhoven Univ. of Techn. - Eindhoven
                    T H E     N E T H E R L A N D S

dph@lanl.gov (David Huelsbeck) (02/27/89)

From article <497@eutrc3.UUCP>, by wsinkees@eutrc3.UUCP (Kees Huizing):
> 
> In the use of the UNIX filter "awk" I encountered the following problem.
> I want to save a line of input except the first field into a variable
> for later use.  So I empty the first field and then assign the modified
> record to the variable.
> But how curious!  When I try the test case
> 	{$1 = "" ; x = $0 ; print x}
> on the input
> 	Tom Dick Harry
> I just get :  
> 	Tom Dick Harry
> Testing the value of $0 by printing it
> 	{$1 = "" ; print}
> yields the expected output
> 	 Dick Harry
> So my solution is:
> 	{$1 = "" ; print >"/dev/null" ; x = $0 ; print x}
> And this works!, yielding
> 	 Dick Harry
> 
> Now I have two questions:
> 1. How can the value of $0 depend on whether it has been printed or not?
>    Is this a bug, or do I overlook some mechanism of awk?


It is a fairly well known bug in the BSD 4.2 version of awk that 
assignments to $1-$n don't change the value of $0.  

I'd never seen or tried the trick of printing $0.
How did you happen to think of this?



> 2. Is there a direct way to get $2, $3, .... until the end of the line (record)
>    without the somewhat dirty change of $1?  This was my original problem.
> 

You could use a "for (i=2; i <= NF; i++) { x = x $i }" but you'll loose
the field separators.  If you know that they'll always be some set bit
of whitespace like space or tab they're easy to replace.  Or pehaps an
sscanf that through the first field into a dummy and the rest into x.

> P.S. We have Ultrix 2.2 (appr. Unix BSD); I don't know how "new" our awk is.
> 

I don't know about 2.2 but the original Ultrix awk was bug for bug
compatable with BSD 4.2 awk.  This is not really surprising; I think
the code was exactly the same. 

> Kees Huizing              wsinkees@eutrc3.UUCP -or- wsdckees@heitue5.BITNET
> Dept. of Math. and Comp. Sc. - Eindhoven Univ. of Techn. - Eindhoven
>                     T H E     N E T H E R L A N D S


-dph

fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder
fodder

jes@mbio.med.upenn.edu (Joe Smith) (02/27/89)

>   Is this a bug, or do I overlook some mechanism of awk?

The AWK book says that "when one of $1, $2, etc., is changed, $0 is
reconstructed using OFS to separate fields".  So it's definitely a
bug.  However, unless you are using System V, you probably have old
awk which may have bugs that have been fixed in new awk.  Our (SunOS
4.0) old awk suffers the same bug.

>2. Is there a direct way to get $2, $3, .... until the end of the line (record)
>   without the somewhat dirty change of $1?  This was my original problem.

You could just concatenate the fields with a simple loop:

	x = ""
	for (f = 2; f < NF; ++f)	# skip $1
		x = x " " $f		# fields separated by space
	print x

That won't preserve the spacing from the input line, but I don't think *any*
awk will do that.  If the spacing is important, maybe you could use sed:

	sed 's/^ *[^ ]* *//' file


--
jes@mbio.med.upenn.edu

University of Pennsylvania
Dept. of Biochemistry and Biophysics
233 Anatomy-Chemistry
Philadelphia, PA 19104-6059
(215) 898-8348

guy@auspex.UUCP (Guy Harris) (02/28/89)

 >The AWK book says that "when one of $1, $2, etc., is changed, $0 is
 >reconstructed using OFS to separate fields".  So it's definitely a
 >bug.  However, unless you are using System V, you probably have old
 >awk which may have bugs that have been fixed in new awk.  Our (SunOS
 >4.0) old awk suffers the same bug.

The SunOS 4.0 "awk" is based on the S5R2 one (mainly because it's
significantly faster than the version of "old awk" that comes with
4.xBSD); this means it's quite likely that *every* version of "old awk"
has the problem in question.  I hesitate to call it a "bug" unless some
documentation on "old awk" says $0 is reconstructed; the AWK book
describes only "new awk", and describes a *lot* of stuff that doesn't
work in "old awk" because it wasn't in "old awk".

jes@mbio.med.upenn.edu (Joe Smith) (02/28/89)

>...  I hesitate to call it a "bug" unless some
>documentation on "old awk" says $0 is reconstructed; the AWK book
>describes only "new awk", and describes a *lot* of stuff that doesn't
>work in "old awk" because it wasn't in "old awk".

Certainly an important point I didn't mention.  However, I think
it's safe to say that having a variable change its contents when it's
printed is a very surprising and undesirable "feature" of a
programming language.

I also meant to mention that GNU-awk handles the test case just fine.
It chokes, however, on slightly more complicated tests, so it may not
handle the original poster's problem:
     $ gawk '{ $1 = ""; print $0}'
     1 2 3
      2 3					ok

     $ gawk '{ $1 = ""; x = $0; print $0}'
     1 2 3
     2]yNzp3]y|@N||@}d|pN|0		oops!
      2 3N|4
      2 3
--
jes@mbio.med.upenn.edu

University of Pennsylvania
Dept. of Biochemistry and Biophysics
233 Anatomy-Chemistry
Philadelphia, PA 19104-6059
(215) 898-8348

lang@pearl.PRC.Unisys.COM (Francois-Michel Lang) (02/28/89)

In article <497@eutrc3.UUCP> wsinkees@eutrc3.UUCP (Kees Huizing) writes:
>
>In the use of the UNIX filter "awk" I encountered the following problem.
...
>1. How can the value of $0 depend on whether it has been printed or not?
>   Is this a bug, or do I overlook some mechanism of awk?

This is a known bug that is pointed out in the document by John W. Pierce
    "A Supplemental Document for AWK, or,
     Things that Al, Pete, and Brian didn't mention much".

That paper also describes a number of nifty features that are not documented
in the standard UNIX dscription for AWK.
----------------------------------------------------------------------------
Francois-Michel Lang
Paoli Research Center, Unisys Corporation lang@prc.unisys.com (215) 648-7256
Dept of Comp & Info Science, U of PA      lang@cis.upenn.edu  (215) 898-9511

lukas@ihlpf.ATT.COM (00771g-Lukas) (03/03/89)

In article <9398@burdvax.PRC.Unisys.COM> lang@pearl.PRC.Unisys.COM (Francois-Michel Lang) writes:
>This is a known bug that is pointed out in the document by John W. Pierce
>    "A Supplemental Document for AWK, or,
>     Things that Al, Pete, and Brian didn't mention much".

Anyone know how to go about
getting a copy of this
document?
-- 

	John Lukas
	att!ihlpf!lukas
	312-510-6290

jim@bilpin.UUCP (Jim G) (03/06/89)

    #{ v_unix.2 }
    IN 	ARTICLE <497@eutrc3.UUCP> , wsinkees@eutrc3.UUCP (Kees Huizing)
    WRITES :

    [ stuff deleted ]

>   Now I have two questions:
>   1. How can the value of $0 depend on whether it has been printed or not?
>      Is this a bug, or do I overlook some mechanism of awk?
>   2. Is there a direct way to get $2, $3, .... until the end of the line (record)
>      without the somewhat dirty change of $1?  This was my original problem.

    1. Our version of awk ( UNIX  System V Rel.2:01 ) works correctly here,
       but such assignments cause compression of white space ( multiple
       tabs/spaces between fields become single spaces ), so I would have to
       defer to {dph@lanl.gov (David Huelsbeck)}'s comments on that, as
       regards the specific problem on your system.
    2. A neat way to print from $? to end of line, as long as you are sure 
       that the value of $? will not appear as an earlier field in the line, 
       is : 	
		print substr( $0, index( $0, $? ) )
       ( see Aho/Weinberger/Kernighan's book 'The AWK Programming Language',
         p.42, for a summary of all the string functions )
-- 
			   <Jim G, Hatfield, England>
	   <Path: mcvax!ukc!icdoc!bilpin!jim> <UUCP: jim@bilpin.uucp>
  Programmers' maxim : If it's not aesthetically pleasing, it's probably wrong.