[comp.unix.questions] Expansion of variables in sh scripts

badri@valhalla.ee.rochester.edu (Badri Lokanathan) (02/14/88)

Well, after several years of shell script writing, I thought I knew
everything about it, but I was wrong!

Given a 2 column file of data in two columns, the columns separated by
blanks. A script is desired which, among other things, searches for a
word in the I column and outputs the corresponding entry in the II column.

There are several ways of doing this; I want to know why the following
inconsistency took place (I tried it on BSD4.3):

#!/bin/sh
word=$1
result=`awk "/^${word}/{print \$2}" datafile`
echo $result
# This outputs the entire line, rather than the entry in the II column.
awk "/^${word}/{print \$2}" datafile
# This outputs only the entry in the II column, as expected.
######################### END SCRIPT ###########################
-- 
"I care about my fellow man            {) badri@valhalla.ee.rochester.edu
 Being taken for a ride,              //\\ {ames,caip,cmcl2,columbia,cornell,
 I care that things start changing   ///\\\ harvard,ll-xn,rutgers,topaz}!
 But there's no one on my side."-UB40 _||_   rochester!ur-valhalla!badri

chris@trantor.umd.edu (Chris Torek) (02/14/88)

In article <1159@valhalla.ee.rochester.edu> badri@valhalla.ee.rochester.edu
(Badri Lokanathan) writes:
>#!/bin/sh
>word=$1
>result=`awk "/^${word}/{print \$2}" datafile`
>echo $result
># This outputs the entire line, rather than the entry in the II column.

Yes.  Shell (sh, not csh) quoting is very easy to explain.  ""
quotes against file expansion; '' quotes against all expansion.
Any time a line is evaluated, one level of quoting is removed.
Backquotes evaluate the text inside the backquotes once.  Hence
"\$2" becomes $2 becomes nothing, and awk prints the whole line.

Note that this means you can use backquotes inside backquotes:

	foo=`eval echo \`basename ...\``

This is not directly possible in csh.
-- 
In-Real-Life: Chris Torek, Univ of MD Computer Science, +1 301 454 7163
(hiding out on trantor.umd.edu until mimsy is reassembled in its new home)
Domain: chris@mimsy.umd.edu		Path: not easily reachable

rupley@arizona.edu (John Rupley) (02/15/88)

In article <1159@valhalla.ee.rochester.edu>, badri@valhalla.ee.rochester.edu (Badri Lokanathan) writes:
> Given a 2 column file of data in two columns, the columns separated by
> blanks. A script is desired which, among other things, searches for a
> word in the I column and outputs the corresponding entry in the II column.
> There are several ways of doing this; I want to know why the following
> inconsistency took place (I tried it on BSD4.3):
> #!/bin/sh
> word=$1
> result=`awk "/^${word}/{print \$2}" datafile`
> echo $result
> # This outputs the entire line, rather than the entry in the II column.
> awk "/^${word}/{print \$2}" datafile
> # This outputs only the entry in the II column, as expected.

Unless you escape the escape:
	result=`awk "/^${word}/{print \\$2}" datafile`
                                   >> ^ <<
the shell substitutes a null string for $2, and print by default
sends out the full line.  To watch what happens, run under "sh -x".

But you perhaps should avoid shell substitution inside an awk program.
The following does what I think you want to do, more simply and
less ambiguously:
#!/bin/sh
result=`awk '$1 == word {print $2}' word=$1 datafile`
echo $result
awk '$1 == word {print $2}' word=$1 datafile


John Rupley
 uucp: ..{ihnp4 | hao!noao}!arizona!rupley!local
 internet: rupley!local@megaron.arizona.edu
 (H) 30 Calle Belleza, Tucson AZ 85716 - (602) 325-4533
 (O) Dept. Biochemistry, Univ. Arizona, Tucson AZ 85721 - (602) 621-3929

wcs@ho95e.ATT.COM (Bill.Stewart) (02/26/88)

In article <1159@valhalla.ee.rochester.edu> badri@valhalla.ee.rochester.edu (Badri Lokanathan) writes:
>Well, after several years of shell script writing, I thought I knew
>everything about it, but I was wrong!
>Given a 2 column file of data in two columns, the columns separated
>by blanks. A script is desired which, among other things, searches
>for a word in the I column and outputs the corresponding entry in
>the II column.

(I realize your question was about why your awk script didn't get
passed the correct arguments.)  But why use awk at all?
It's very flexible, but much slower than egrep or sed.

For this application, I'd recommend 
	: Usage: myname pattern file
	egrep "^$1 " $2 | cut -f2 -d" "

Even if you decide to use awk instead of cut to extract the second
column (and presumably do summaries or other useful work), you'll
speed the program up significantly by using egrep to reduce the
amount of data that awk has to process.

Alternatively, you can write it in shell (which won't be real fast either.)
	: Usage: myname pattern file
	pattern="$1"; shift
	cat $* | while read col1 col2 ; do
		if [ "$col1" = "$pattern" ]
		then echo $col2
		fi
	done

If your shell doesn't provide test ([) as a builtin, use
case instead.
-- 
#				Thanks;
# Bill Stewart, AT&T Bell Labs 2G218, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs

rupley@arizona.edu (John Rupley) (02/27/88)

In article <2004@ho95e.ATT.COM>, wcs@ho95e.ATT.COM (Bill.Stewart) writes:
> In article <1159@valhalla.ee.rochester.edu> badri@valhalla.ee.rochester.edu (Badri Lokanathan) writes:
> >Given a 2 column file of data in two columns, the columns separated
> >by blanks. A script is desired which, among other things, searches
           ^ [please note the plural]
> >for a word in the I column and outputs the corresponding entry in
> >the II column.
> 
> For this application, I'd recommend 
> 	: Usage: myname pattern file
> 	egrep "^$1 " $2 | cut -f2 -d" "

The above won't work, as it cuts at the first blank of a possible series
of whitespace characters.  The following fits the specification and can
be adapted to include other types of whitespace, eg, tabs:
        egrep "^$1 " $2 | tr -s " " " " | cut -f2 -d" "

This points up why awk is useful - it has fewer gotchas, of the above
nit-picky kind, and the code is straightforward:
	awk '$1 == patone {print $2}' patone="$1" $2

> Even if you decide to use awk instead of cut to extract the second
> column (and presumably do summaries or other useful work), you'll
> speed the program up significantly by using egrep to reduce the
> amount of data that awk has to process.

Right! But only if the file to be searched is long.  For short files, 
awk, with a single load and no pipes, is faster -- try it! Also, even 
for search of large files, if you prototype in awk, and subsequently, 
if execution time is a bore, recode (even in C (:-), you will probably 
save effort.

> # Bill Stewart, AT&T Bell Labs 2G218, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs

John Rupley
 uucp: ..{ihnp4 | hao!noao}!arizona!rupley!local
 internet: rupley!local@megaron.arizona.edu
 telex: 9103508679(JARJAR)
 (H) 30 Calle Belleza, Tucson AZ 85716 - (602) 325-4533
 (O) Dept. Biochemistry, Univ. Arizona, Tucson AZ 85721 - (602) 621-3929