[comp.editors] Awk

alonso@maxwell.mmwb.ucsf.edu (Darwin Alonso) (03/08/91)

I wasn't sure of the correct news group for this. Maybe
*unix would have been better, but awk is a way to process
text.

I have a question concerning what awk compares with the greater
and less than operators (><). It seems to compare floating point numbers
if the field is a number, but compares ASCII value (?) if the field
is not a number.
Is this intentional?
The following illustrates the unexpected (to me) results.


#---------------   original.file   --------

This is text
1.
2.
3.
3.9
4.
4.1
5.
6.
7.
8.
9.
10.


#---------------------------------
#     Output from nawk '$1 > 4 {print $0}' original.file

This is text
4.1
5.
6.
7.
8.
9.
10.

#---------------------------------
#     Output from nawk '$1 < 4 {print $0}' original.file

1.
2.
3.
3.9

#--------------------------------------------
I would have expected the "This is text" line to appear in
both cases or not appear in both cases. I couldn't find anything
on this in the awk book (Aho, Kernigan, Weiberger).

Thanks,

louxj@jacobs.CS.ORST.EDU (John W. Loux) (03/10/91)

In article <17851@cgl.ucsf.EDU> alonso@maxwell.mmwb.ucsf.edu (Darwin Alonso) writes:
>I have a question concerning what awk compares with the greater
>and less than operators (><). It seems to compare floating point numbers
>if the field is a number, but compares ASCII value (?) if the field
>is not a number.
>Is this intentional?
>
>[...] I couldn't find anything
>on this in the awk book (Aho, Kernigan, Weiberger).

In ``the book'' (The AWK Programming Language by Aho, Kernighan and
Weinberger), page 25 says:

``In a relational comparison, if both operands are numeric, a numeric
comparison is made; otherwise, any numeric operand is converted to a string,
and then the operands are compared as strings.  The strings are compared
character by character using the ordering provided by the machine, most often
the ASCII character set.  One string is said to be `less than' another if it
would appear before the other according to this ordering, e.g., "Canada" <
"China" and "Asia" < "Asian".''

John

chapin@cbnewsc.att.com ( Tom Chapin ) (03/10/91)

Darwin Alonso writes:
>I have a question concerning what awk compares with the greater
>and less than operators (><). It seems to compare floating point numbers
>if the field is a number, but compares ASCII value (?) if the field
>is not a number.
>Is this intentional?

Yes.  Any expression can have both an ascii and a numerical value.
If both sides of a comparison have numerical values, the comparison
will be numeric, otherwise a string comparison will be made.

This default behavior can be modified by coercing the values:

	VARIABLE + 0	will coerce the expression to number
	VARIABLE ""	will coerce to string

If an apparent string is coerced to numeric, its numeric value
will be whatever number occurs before the first non-numeric
character in the string.  If none occur, the numeric value will
be zero.

>I couldn't find anything
>on this in the awk book (Aho, Kernigan, Weiberger).

Try pages 25-26 and pages 44-45.

-- 
     tom chapin                att!hrccb!tjc         tjc@hrccb.att.com