[comp.bugs.sys5] awk bug

vanam@pttesac.UUCP (Marnix van Ammers) (01/31/88)

While trying to install a new program I ran across a bug in our Sys
V, release 2.1.1 (AT&T 3B20) awk.  In our awk the following pattern
always matches (even if there are 5 or less fields on the current
line):

if $6 != ""

This does not happen on the awk on my 3B1 version 3.51 .

Is this a known bug or what?

-- 
Marnix (ain't unix!) A.  van\ Ammers		Work: (415) 545-8334
Home: (707) 644-9781				CEO: MAVANAMMERS:UNIX
WORK UUCP: {ihnp4|ptsfa}!pttesac!vanam		CIS: 70027,70
HOME UUCP: pttesac!Marnix!vanam 

rupley@arizona.edu (John Rupley) (02/07/88)

In article <672@pttesac.UUCP>, vanam@pttesac.UUCP (Marnix van Ammers) writes:
> While trying to install a new program I ran across a bug in our Sys
> V, release 2.1.1 (AT&T 3B20) awk.  In our awk the following pattern
> always matches (even if there are 5 or less fields on the current
> line):
> 
> if $6 != ""
> 
> This does not happen on the awk on my 3B1 version 3.51 .
> 
> Is this a known bug or what?

Could it be a corrupt copy of awk on your release 2 system?
The following code excutes properly with my SysV.r2 awk and
with the new awk (your 3.51 version?):

echo $* | awk '$6 != ""	{print "$6_!=_zerolength", NR, NF, $6}'
echo $* | awk '{if ($6 != "")print "$6_!=_zerolength", NR, NF, $6}'


John Rupley
    internet: rupley@megaron.arizona.edu
    uucp: ..{ihnp4 | hao!noao}!arizona!rupley
    Dept. Biochemistry, Univ. Arizona, Tucson  AZ  85721
    voice: (602)321-3929 (Office)   or   (602)325-4533 (Home)

levy@ttrdc.UUCP (Daniel R. Levy) (02/08/88)

In article <3748@megaron.arizona.edu>, rupley@arizona.edu (John Rupley) writes:
> In article <672@pttesac.UUCP>, vanam@pttesac.UUCP (Marnix van Ammers) writes:
> > While trying to install a new program I ran across a bug in our Sys
> > V, release 2.1.1 (AT&T 3B20) awk.  In our awk the following pattern
> > always matches (even if there are 5 or less fields on the current
> > line):
> > if $6 != ""
> > This does not happen on the awk on my 3B1 version 3.51 .
> > Is this a known bug or what?
> Could it be a corrupt copy of awk on your release 2 system?
> The following code excutes properly with my SysV.r2 awk and
> with the new awk (your 3.51 version?):
> echo $* | awk '$6 != ""	{print "$6_!=_zerolength", NR, NF, $6}'
> echo $* | awk '{if ($6 != "")print "$6_!=_zerolength", NR, NF, $6}'

Alas, I must plead guilty (even though I'm not responsible for awk, I'm still
a Death-Starian) for awk's behavior in this manner on the 3B20 (we're running
2.0v3 here).  It's coming from a dereference of a null pointer (the string
"f{\0" is present beginning at location zero in a 3B20 process).  If Rupley
is using a VAX, on the other hand, everything will seem to be hunkey dorey
(location 0 in a VAX [System V UNIX] process contains a zero byte, which is
tantamount to a null string).

I would posit that, just as when programming in C, testing a field without
first knowing that it is valid (the field count is high enough) is poor
programming practice.  I will eat these words if someone can show me awk
documentation that says that an undefined positional parameter is guaranteed
to be null/0 just as an undefined member of an array or previously unused
variable is guaranteed to be.  (I've written many a line of awk code using
much the same care I would use with C, and never tripped over this problem.)
Barring such a guarantee, and certainly in the present situation, it is better
practice, given that one knows that there may be less than six positional
parameters in an input record, to use

	NF >= 6 { action using $6 }

than it is to use

	$6 != "" { action using $6 }

just as you would not blithely want to do (in C):

main(argc,argv)
char **argv;
{
	foo(argv[6]);	/* what if argc < 6 ? */
}
-- 
|------------Dan Levy------------|  Path: ..!{akgua,homxb,ihnp4,ltuxa,mvuxa,
|         an Engihacker @        |  	<most AT&T machines>}!ttrdc!ttrda!levy
| AT&T Computer Systems Division |  Disclaimer?  Huh?  What disclaimer???
|--------Skokie, Illinois--------|

rupley@arizona.edu (John Rupley) (02/08/88)

In article <2161@ttrdc.UUCP>,  levy@ttrdc.UUCP (Daniel R. Levy) writes:
> In article <3748@megaron.arizona.edu>, rupley@arizona.edu (John Rupley) writes:
> > In article <672@pttesac.UUCP>, vanam@pttesac.UUCP (Marnix van Ammers) writes:
> > > While trying to install a new program I ran across a bug in our Sys
> > > V, release 2.1.1 (AT&T 3B20) awk.  In our awk the following pattern
> > > always matches (even if there are 5 or less fields on the current
> > > line):
> > > if $6 != ""
> > > This does not happen on the awk on my 3B1 version 3.51 .
> > > Is this a known bug or what?
> > Could it be a corrupt copy of awk on your release 2 system?
> > The following code excutes properly with my SysV.r2 awk and
> > with the new awk (your 3.51 version?):
> > echo $* | awk '$6 != ""	{print "$6_!=_zerolength", NR, NF, $6}'
> > echo $* | awk '{if ($6 != "")print "$6_!=_zerolength", NR, NF, $6}'
> 
> Alas, I must plead guilty (even though I'm not responsible for awk, I'm still
> a Death-Starian) for awk's behavior in this manner on the 3B20 (we're running
> 2.0v3 here).  It's coming from a dereference of a null pointer (the string
> "f{\0" is present beginning at location zero in a 3B20 process).  

This is a bit off the thread of the awk bug, but if the 3B20 can't 
handle a NULL pointer in awk, how does it handle C code like:
	.
	cmpstr(strchr("abcdef", 'g'), "hijk")
	.
cmpstr(s, t)
char *s, *t;
{
	[standard stuff]
}

> If Rupley
> is using a VAX, on the other hand, everything will seem to be hunkey dorey
> (location 0 in a VAX [System V UNIX] process contains a zero byte, which is
> tantamount to a null string).

Rupley was using an 80286 machine, and more about that below.

> I would posit that, just as when programming in C, testing a field without
> first knowing that it is valid (the field count is high enough) is poor
> programming practice.  I will eat these words if someone can show me awk
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> documentation that says that an undefined positional parameter is guaranteed
> to be null/0 just as an undefined member of an array or previously unused
> variable is guaranteed to be.  

Wow (:-!!  Consider A-K-W, "The AWK Programming Language", A-W 1988, p 192:

'Fields that are explicitly null have the string value ""; they are not
numeric.  Nonexistent fields (i.e., fields past NF) and $0 for blank 
lines are treated this way too.'

The above statement is in the "summary" of the awk language (Appendix 
A). It took only a few minutes and the index to find equivalent 
statements, perhaps a bit clearer, in other sections of the book.

> (I've written many a line of awk code using
> much the same care I would use with C, and never tripped over this problem.)
> Barring such a guarantee, and certainly in the present situation, it is better
> practice, given that one knows that there may be less than six positional
> parameters in an input record, to use
> 
> 	NF >= 6 { action using $6 }
> 
> than it is to use
> 
> 	$6 != "" { action using $6 }

Defensive coding is probably like motherhood.  But perhaps in this case
the mothers lose.  You can create fields within an awk program (eg, 
$5 when NF = 3), and, quoting A-K-W, p36:

"Any intervening fields are created when necessary and given null values."


Back to awk, the 80286, and bugs.  First, there are indeed bugs in awk, 
specifically the new awk.  There have been several postings of new awk 
bugs, with fixes.  I don't think a deficiency (feature?) of the 3B20 
system should be considered an awk bug, however. Second, bringing new 
awk up on an 80286 was a bit unpleasant, owing to the coders' 
assumption that sizeof (int) = sizeof (int *).  Should we be horrified, 
annoyed, or whatever that AT&T, the home of C, assumes all the world's 
a VAX (:-?  Seriously, I do hope that future software sold by AT&T will 
be written to be properly portable.

> |------------Dan Levy------------|  Path: ..!{akgua,homxb,ihnp4,ltuxa,mvuxa,
> |         an Engihacker @        |  	<most AT&T machines>}!ttrdc!ttrda!levy
> | AT&T Computer Systems Division |  Disclaimer?  Huh?  What disclaimer???
> |--------Skokie, Illinois--------|

John Rupley
 uucp: ..{ihnp4 | hao!noao}!arizona!rupley!local
 internet: rupley!local@megaron.arizona.edu
 (H) 30 Calle Belleza, Tucson AZ 85716 - (602) 325-4533
 (O) Dept. Biochemistry, Univ. Arizona, Tucson AZ 85721 - (602) 621-3929

wjc@ho5cad.ATT.COM (02/10/88)

In article <3763@megaron.arizona.edu> rupley@arizona.edu (John Rupley) writes:
>This is a bit off the thread of the awk bug, but if the 3B20 can't 
>handle a NULL pointer in awk, how does it handle C code like:
>	   .
>	   cmpstr(strchr("abcdef", 'g'), "hijk")
>	   .
>cmpstr(s, t)
>char *s, *t;
>{
>	   [standard stuff]
>}
>

I  think you've misinterpreted   slightly the earlier poster's  (Levy)
remark about why dereferencing a null pointer caused a problem in awk.
It's  not that the  machine can't handle it.   In fact, it goes beyond
the call of C, so to speak.

A pointer whose value is zero is defined  as a  pointer which does not
point at any  valid object.   It  just   so happens  that   if you  do
erroneously reference it  on a  3b20, you get   its  famous "f(".  For
example, if your "[standard stuff]" was a

	printf ("|%s|\n", s);

it would yield

	|f(|

Of course, your code would contain  a portability  bug in such a case,
since it would be illegal  to try to  use that  null pointer.  Usually
fixed by this timeworn macro: #define VIS(s) ((s)?(s):"").

Contrast this with doing  the same thing  on (a) original  VAX series,
which happened to always have a null character at location zero, so it
general worked out (but left you with a portability  time bomb), (b) a
Sun, which dumps core if you dereference a null at  all (think of this
as  runtime  validity checking  :-); (c)  VAX 86xx, where  you find at
location zero  a string something   like "}^A^C"  (which   looks quite
attractive when it spills out on the screen).

(I'm beating a dead horse now ...)  There are a  couple of helpers for
this null pointer business some places.  Some C compilers  have a flag
which makes  the low bytes  of the program  unreadable.  This sort  of
acts  like a  Sun in this  respect, but  with less trauma.  Also, some
implementations of the printf() family explicity convert null pointers
to null strings for %s.

	Bill Carpenter
	(AT&T gateways)!ho5cad!wjc
	HO 1L-410, (201)949-8392, OCW x4367

ekb@ho7cad.ATT.COM (Eric K. Bustad) (02/10/88)

In article <3763@megaron.arizona.edu>, rupley@arizona.edu (John Rupley) writes:
> This is a bit off the thread of the awk bug, but if the 3B20 can't 
> handle a NULL pointer in awk, how does it handle C code like:
> 	.
> 	cmpstr(strchr("abcdef", 'g'), "hijk")
> 	.
> cmpstr(s, t)
> char *s, *t;
> {
> 	[standard stuff]
> }

It handles C code like the above badly, because the code is basically
wrong.  C does not treat the NULL pointer as equivalent to a NULL string.

Any code that does this will happen to work on a VAX running UNIX, but
will fail on many more machines than just AT&T's 3B20.  I seem to recall
that on some machines you will get a memory access error if you dereference
a NULL pointer!  I rather will that the 3B20 did this, so that these
errors would have been caught much earlier.

allbery@ncoast.UUCP (Brandon Allbery) (02/13/88)

As quoted from <2161@ttrdc.UUCP> by levy@ttrdc.UUCP (Daniel R. Levy):
+---------------
| I would posit that, just as when programming in C, testing a field without
| first knowing that it is valid (the field count is high enough) is poor
| programming practice.  I will eat these words if someone can show me awk
| documentation that says that an undefined positional parameter is guaranteed
| to be null/0 just as an undefined member of an array or previously unused
| variable is guaranteed to be.  (I've written many a line of awk code using
+---------------

The bible ("Awk - A Pattern Scanning and Processing Language") doesn't say
anything one way or the other.  On the other hand, the awk guide which came
with my 3B1 says:  "If NF < i <= 100, then $i behaves like an uninitialized
var."  I would say that the issue is open -- but the behavior on the 3B20 is
still wrong:  it should either return a null string or cause an error, it
should NOT return "f{" or anything like that.  After all, given this behavior
if I try it on ncoast awk will dump core (address 0 isn't mapped).
-- 
	      Brandon S. Allbery, moderator of comp.sources.misc
       {well!hoptoad,uunet!hnsurg3,cbosgd,sun!mandrill}!ncoast!allbery
KABOOM!!! Worf: "I think I'm sick." LaForge: "I'm sure half the ship knows it."

allbery@ncoast.UUCP (Brandon Allbery) (02/17/88)

As quoted from <275@ho7cad.ATT.COM> by ekb@ho7cad.ATT.COM (Eric K. Bustad):
+---------------
| In article <3763@megaron.arizona.edu>, rupley@arizona.edu (John Rupley) writes:
| Any code that does this will happen to work on a VAX running UNIX, but
| will fail on many more machines than just AT&T's 3B20.  I seem to recall
| that on some machines you will get a memory access error if you dereference
| a NULL pointer!  I rather will that the 3B20 did this, so that these
| errors would have been caught much earlier.
+---------------

The COFF loader supports the -Z option to force address zero to not be mapped.
Alternatively, I believe SVR3 ld uses "ifiles" to construct load images; in
this case, you can edit the ifiles to not map address zero:

MEMORY {
	user_mem : origin = 0x2, length = 0xffffffff
}

which insures that the first word of memory is never mapped.  If your ld
doesn't allow either of these but DOES use ifiles, create an ifile with
the above declaration and make it the first argument to ld.  (The length might
have to be tuned for a particular system, e.g. 3B1s can't map past address
0x300000 due to the shared memory hack.)
-- 
	      Brandon S. Allbery, moderator of comp.sources.misc
       {well!hoptoad,uunet!hnsurg3,cbosgd,sun!mandrill}!ncoast!allbery
KABOOM!!! Worf: "I think I'm sick." LaForge: "I'm sure half the ship knows it."