[comp.unix.questions] More AWK tricks

martin@mwtech.UUCP (Martin Weitzel) (05/09/91)
In article <791@necssd.NEC.COM> harrison@necssd.NEC.COM (Mark Harrison) writes:
>In article <1817@wjvax.UUCP>, mario@wjvax.UUCP (Mario Dona) writes:

>> 1.  How to prevent blank lines from printing if there is nothing to print

>Add this line after your BEGIN rule:
>/^$/ {next} #skip blank lines
>
>If you want to skip lines that may have white space:
>/^[ \t]*$/ {next} #skip blank (non-text) lines

Though I don't think that is what the poster tried to ask, Mark gives
a good hint here how to avoid unnecessary processing of empty input lines.
I have a further suggestions, which also isn't related to the original
question but turns out to be very handy in many occasions. Extend Mark's
proposual to

/^[ \t]*(#.*|[ \t]*)$/ { next; } #skip blank lines and comments

This helps you to embedd comments in the usual style (line beginning
with '#') within data that should be processed by AWK. Of course this
is only applicable, if you yourself write ALL the tools which process
the data. But as it seems that '#'-comments are allowed in many system
configuration files, I think it's good practice to stick to this style
if you want such a feature in your own tools. And as you see, it's
really easy. (There's no excuse to have *NO* such feature if the format
of the data you process doesn't come from outside but is your own design.)

It's not too hard to have even more sohisticated comment processing,
i.e. that '#'-comments can start anywhere within the line, but then it
gets less convenient and if you don't really *need* it (but just think
it's a nice feature), it's not worth it. (Mail me your solution if
you want, I'll put together a summary and select the shortest one.)

>> 2.  How to concatenate the city and zip fields as shown.
>
>To concatenate:
>
>	city_and_zip = city " " zip
>
>To strip trailing space from city before concatenating:
>
>	while (substr(city, length(city)) == " ")
>		city = substr(city, 1, length(city) - 1)

I'd also prefer the above for portability, but as NAWK becomes more and
more available I think that "gsub" will be the more convenient approach.

>> 3.  [Capitalizing]
>
>This is doable, but not enjoyable.  There is more of a chance if
>you use nawk or gawk. Otherwise, make an array:
>
>uc["a"] = "A"  ... uc["z"] = "Z"
>lc["A"] = "a"  ... lc["Z"] = "z"

I would add here that you don't have necessarily to write 26 lines
here, as you can initialize uc and lc in a loop (OK, OK, it's not
totally portable, but at least on ASCII it works and for ISO 8859
any special characters of a foreign character set can be added by
hand after this loop:

	# *****
	# NOTE: unportable code follows, ASCII character set is assumed!!
	# *****
	for (i = 65; i < 65+26; i++) {
		lc[sprintf("%c", i)] = sprintf("%c", i+32);
		uc[sprintf("%c", i+32)] = sprintf("%c", i);
	}

Or, if you are one of those who like it a bit more obscure, change the body
of the loop to:

	uc[lc[sprintf("%c", i)] = sprintf("%c", i+32)] = sprintf("%c", i);

>and loop for the length of the string:
>
>    if (uc[substr(str, i, 1)] == "")
>        newstr = newstr substr(str, i , 1)
>    else
>        newstr = uc[substr(str, i, 1)]

It is probably worth the price (for performance reasons), to initialize
the array used for mapping completly so that uc[x] == x (when x should
remain unmapped). Though this requires a little more data space%, there
is no need to check a condition as often as the body of the loop is
executed.

%: There's a maybe little known pitfall in - at least the old - AWK:
When a statement like

	if (array[z] == "")

is executed, an entry for the index value z is made for array. (You'll
normaly not note this as the contents of array[z] is the empty string,
but it gets in the way if you later have a for (a in array).) Because
of this peculiarity you save fewer data space than you may think when
you use the above construct, as non-existing index values of `uc' are
added by the test.
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83