vanam@pttesac.UUCP (Marnix van Ammers) (01/31/88)
While trying to install a new program I ran across a bug in our Sys V, release 2.1.1 (AT&T 3B20) awk. In our awk the following pattern always matches (even if there are 5 or less fields on the current line): if $6 != "" This does not happen on the awk on my 3B1 version 3.51 . Is this a known bug or what? -- Marnix (ain't unix!) A. van\ Ammers Work: (415) 545-8334 Home: (707) 644-9781 CEO: MAVANAMMERS:UNIX WORK UUCP: {ihnp4|ptsfa}!pttesac!vanam CIS: 70027,70 HOME UUCP: pttesac!Marnix!vanam
rupley@arizona.edu (John Rupley) (02/07/88)
In article <672@pttesac.UUCP>, vanam@pttesac.UUCP (Marnix van Ammers) writes: > While trying to install a new program I ran across a bug in our Sys > V, release 2.1.1 (AT&T 3B20) awk. In our awk the following pattern > always matches (even if there are 5 or less fields on the current > line): > > if $6 != "" > > This does not happen on the awk on my 3B1 version 3.51 . > > Is this a known bug or what? Could it be a corrupt copy of awk on your release 2 system? The following code excutes properly with my SysV.r2 awk and with the new awk (your 3.51 version?): echo $* | awk '$6 != "" {print "$6_!=_zerolength", NR, NF, $6}' echo $* | awk '{if ($6 != "")print "$6_!=_zerolength", NR, NF, $6}' John Rupley internet: rupley@megaron.arizona.edu uucp: ..{ihnp4 | hao!noao}!arizona!rupley Dept. Biochemistry, Univ. Arizona, Tucson AZ 85721 voice: (602)321-3929 (Office) or (602)325-4533 (Home)
levy@ttrdc.UUCP (Daniel R. Levy) (02/08/88)
In article <3748@megaron.arizona.edu>, rupley@arizona.edu (John Rupley) writes: > In article <672@pttesac.UUCP>, vanam@pttesac.UUCP (Marnix van Ammers) writes: > > While trying to install a new program I ran across a bug in our Sys > > V, release 2.1.1 (AT&T 3B20) awk. In our awk the following pattern > > always matches (even if there are 5 or less fields on the current > > line): > > if $6 != "" > > This does not happen on the awk on my 3B1 version 3.51 . > > Is this a known bug or what? > Could it be a corrupt copy of awk on your release 2 system? > The following code excutes properly with my SysV.r2 awk and > with the new awk (your 3.51 version?): > echo $* | awk '$6 != "" {print "$6_!=_zerolength", NR, NF, $6}' > echo $* | awk '{if ($6 != "")print "$6_!=_zerolength", NR, NF, $6}' Alas, I must plead guilty (even though I'm not responsible for awk, I'm still a Death-Starian) for awk's behavior in this manner on the 3B20 (we're running 2.0v3 here). It's coming from a dereference of a null pointer (the string "f{\0" is present beginning at location zero in a 3B20 process). If Rupley is using a VAX, on the other hand, everything will seem to be hunkey dorey (location 0 in a VAX [System V UNIX] process contains a zero byte, which is tantamount to a null string). I would posit that, just as when programming in C, testing a field without first knowing that it is valid (the field count is high enough) is poor programming practice. I will eat these words if someone can show me awk documentation that says that an undefined positional parameter is guaranteed to be null/0 just as an undefined member of an array or previously unused variable is guaranteed to be. (I've written many a line of awk code using much the same care I would use with C, and never tripped over this problem.) Barring such a guarantee, and certainly in the present situation, it is better practice, given that one knows that there may be less than six positional parameters in an input record, to use NF >= 6 { action using $6 } than it is to use $6 != "" { action using $6 } just as you would not blithely want to do (in C): main(argc,argv) char **argv; { foo(argv[6]); /* what if argc < 6 ? */ } -- |------------Dan Levy------------| Path: ..!{akgua,homxb,ihnp4,ltuxa,mvuxa, | an Engihacker @ | <most AT&T machines>}!ttrdc!ttrda!levy | AT&T Computer Systems Division | Disclaimer? Huh? What disclaimer??? |--------Skokie, Illinois--------|
rupley@arizona.edu (John Rupley) (02/08/88)
In article <2161@ttrdc.UUCP>, levy@ttrdc.UUCP (Daniel R. Levy) writes: > In article <3748@megaron.arizona.edu>, rupley@arizona.edu (John Rupley) writes: > > In article <672@pttesac.UUCP>, vanam@pttesac.UUCP (Marnix van Ammers) writes: > > > While trying to install a new program I ran across a bug in our Sys > > > V, release 2.1.1 (AT&T 3B20) awk. In our awk the following pattern > > > always matches (even if there are 5 or less fields on the current > > > line): > > > if $6 != "" > > > This does not happen on the awk on my 3B1 version 3.51 . > > > Is this a known bug or what? > > Could it be a corrupt copy of awk on your release 2 system? > > The following code excutes properly with my SysV.r2 awk and > > with the new awk (your 3.51 version?): > > echo $* | awk '$6 != "" {print "$6_!=_zerolength", NR, NF, $6}' > > echo $* | awk '{if ($6 != "")print "$6_!=_zerolength", NR, NF, $6}' > > Alas, I must plead guilty (even though I'm not responsible for awk, I'm still > a Death-Starian) for awk's behavior in this manner on the 3B20 (we're running > 2.0v3 here). It's coming from a dereference of a null pointer (the string > "f{\0" is present beginning at location zero in a 3B20 process). This is a bit off the thread of the awk bug, but if the 3B20 can't handle a NULL pointer in awk, how does it handle C code like: . cmpstr(strchr("abcdef", 'g'), "hijk") . cmpstr(s, t) char *s, *t; { [standard stuff] } > If Rupley > is using a VAX, on the other hand, everything will seem to be hunkey dorey > (location 0 in a VAX [System V UNIX] process contains a zero byte, which is > tantamount to a null string). Rupley was using an 80286 machine, and more about that below. > I would posit that, just as when programming in C, testing a field without > first knowing that it is valid (the field count is high enough) is poor > programming practice. I will eat these words if someone can show me awk ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > documentation that says that an undefined positional parameter is guaranteed > to be null/0 just as an undefined member of an array or previously unused > variable is guaranteed to be. Wow (:-!! Consider A-K-W, "The AWK Programming Language", A-W 1988, p 192: 'Fields that are explicitly null have the string value ""; they are not numeric. Nonexistent fields (i.e., fields past NF) and $0 for blank lines are treated this way too.' The above statement is in the "summary" of the awk language (Appendix A). It took only a few minutes and the index to find equivalent statements, perhaps a bit clearer, in other sections of the book. > (I've written many a line of awk code using > much the same care I would use with C, and never tripped over this problem.) > Barring such a guarantee, and certainly in the present situation, it is better > practice, given that one knows that there may be less than six positional > parameters in an input record, to use > > NF >= 6 { action using $6 } > > than it is to use > > $6 != "" { action using $6 } Defensive coding is probably like motherhood. But perhaps in this case the mothers lose. You can create fields within an awk program (eg, $5 when NF = 3), and, quoting A-K-W, p36: "Any intervening fields are created when necessary and given null values." Back to awk, the 80286, and bugs. First, there are indeed bugs in awk, specifically the new awk. There have been several postings of new awk bugs, with fixes. I don't think a deficiency (feature?) of the 3B20 system should be considered an awk bug, however. Second, bringing new awk up on an 80286 was a bit unpleasant, owing to the coders' assumption that sizeof (int) = sizeof (int *). Should we be horrified, annoyed, or whatever that AT&T, the home of C, assumes all the world's a VAX (:-? Seriously, I do hope that future software sold by AT&T will be written to be properly portable. > |------------Dan Levy------------| Path: ..!{akgua,homxb,ihnp4,ltuxa,mvuxa, > | an Engihacker @ | <most AT&T machines>}!ttrdc!ttrda!levy > | AT&T Computer Systems Division | Disclaimer? Huh? What disclaimer??? > |--------Skokie, Illinois--------| John Rupley uucp: ..{ihnp4 | hao!noao}!arizona!rupley!local internet: rupley!local@megaron.arizona.edu (H) 30 Calle Belleza, Tucson AZ 85716 - (602) 325-4533 (O) Dept. Biochemistry, Univ. Arizona, Tucson AZ 85721 - (602) 621-3929
wjc@ho5cad.ATT.COM (02/10/88)
In article <3763@megaron.arizona.edu> rupley@arizona.edu (John Rupley) writes: >This is a bit off the thread of the awk bug, but if the 3B20 can't >handle a NULL pointer in awk, how does it handle C code like: > . > cmpstr(strchr("abcdef", 'g'), "hijk") > . >cmpstr(s, t) >char *s, *t; >{ > [standard stuff] >} > I think you've misinterpreted slightly the earlier poster's (Levy) remark about why dereferencing a null pointer caused a problem in awk. It's not that the machine can't handle it. In fact, it goes beyond the call of C, so to speak. A pointer whose value is zero is defined as a pointer which does not point at any valid object. It just so happens that if you do erroneously reference it on a 3b20, you get its famous "f(". For example, if your "[standard stuff]" was a printf ("|%s|\n", s); it would yield |f(| Of course, your code would contain a portability bug in such a case, since it would be illegal to try to use that null pointer. Usually fixed by this timeworn macro: #define VIS(s) ((s)?(s):""). Contrast this with doing the same thing on (a) original VAX series, which happened to always have a null character at location zero, so it general worked out (but left you with a portability time bomb), (b) a Sun, which dumps core if you dereference a null at all (think of this as runtime validity checking :-); (c) VAX 86xx, where you find at location zero a string something like "}^A^C" (which looks quite attractive when it spills out on the screen). (I'm beating a dead horse now ...) There are a couple of helpers for this null pointer business some places. Some C compilers have a flag which makes the low bytes of the program unreadable. This sort of acts like a Sun in this respect, but with less trauma. Also, some implementations of the printf() family explicity convert null pointers to null strings for %s. Bill Carpenter (AT&T gateways)!ho5cad!wjc HO 1L-410, (201)949-8392, OCW x4367
ekb@ho7cad.ATT.COM (Eric K. Bustad) (02/10/88)
In article <3763@megaron.arizona.edu>, rupley@arizona.edu (John Rupley) writes: > This is a bit off the thread of the awk bug, but if the 3B20 can't > handle a NULL pointer in awk, how does it handle C code like: > . > cmpstr(strchr("abcdef", 'g'), "hijk") > . > cmpstr(s, t) > char *s, *t; > { > [standard stuff] > } It handles C code like the above badly, because the code is basically wrong. C does not treat the NULL pointer as equivalent to a NULL string. Any code that does this will happen to work on a VAX running UNIX, but will fail on many more machines than just AT&T's 3B20. I seem to recall that on some machines you will get a memory access error if you dereference a NULL pointer! I rather will that the 3B20 did this, so that these errors would have been caught much earlier.
allbery@ncoast.UUCP (Brandon Allbery) (02/13/88)
As quoted from <2161@ttrdc.UUCP> by levy@ttrdc.UUCP (Daniel R. Levy): +--------------- | I would posit that, just as when programming in C, testing a field without | first knowing that it is valid (the field count is high enough) is poor | programming practice. I will eat these words if someone can show me awk | documentation that says that an undefined positional parameter is guaranteed | to be null/0 just as an undefined member of an array or previously unused | variable is guaranteed to be. (I've written many a line of awk code using +--------------- The bible ("Awk - A Pattern Scanning and Processing Language") doesn't say anything one way or the other. On the other hand, the awk guide which came with my 3B1 says: "If NF < i <= 100, then $i behaves like an uninitialized var." I would say that the issue is open -- but the behavior on the 3B20 is still wrong: it should either return a null string or cause an error, it should NOT return "f{" or anything like that. After all, given this behavior if I try it on ncoast awk will dump core (address 0 isn't mapped). -- Brandon S. Allbery, moderator of comp.sources.misc {well!hoptoad,uunet!hnsurg3,cbosgd,sun!mandrill}!ncoast!allbery KABOOM!!! Worf: "I think I'm sick." LaForge: "I'm sure half the ship knows it."
allbery@ncoast.UUCP (Brandon Allbery) (02/17/88)
As quoted from <275@ho7cad.ATT.COM> by ekb@ho7cad.ATT.COM (Eric K. Bustad): +--------------- | In article <3763@megaron.arizona.edu>, rupley@arizona.edu (John Rupley) writes: | Any code that does this will happen to work on a VAX running UNIX, but | will fail on many more machines than just AT&T's 3B20. I seem to recall | that on some machines you will get a memory access error if you dereference | a NULL pointer! I rather will that the 3B20 did this, so that these | errors would have been caught much earlier. +--------------- The COFF loader supports the -Z option to force address zero to not be mapped. Alternatively, I believe SVR3 ld uses "ifiles" to construct load images; in this case, you can edit the ifiles to not map address zero: MEMORY { user_mem : origin = 0x2, length = 0xffffffff } which insures that the first word of memory is never mapped. If your ld doesn't allow either of these but DOES use ifiles, create an ifile with the above declaration and make it the first argument to ld. (The length might have to be tuned for a particular system, e.g. 3B1s can't map past address 0x300000 due to the shared memory hack.) -- Brandon S. Allbery, moderator of comp.sources.misc {well!hoptoad,uunet!hnsurg3,cbosgd,sun!mandrill}!ncoast!allbery KABOOM!!! Worf: "I think I'm sick." LaForge: "I'm sure half the ship knows it."