badri@valhalla.ee.rochester.edu (Badri Lokanathan) (02/14/88)
Well, after several years of shell script writing, I thought I knew everything about it, but I was wrong! Given a 2 column file of data in two columns, the columns separated by blanks. A script is desired which, among other things, searches for a word in the I column and outputs the corresponding entry in the II column. There are several ways of doing this; I want to know why the following inconsistency took place (I tried it on BSD4.3): #!/bin/sh word=$1 result=`awk "/^${word}/{print \$2}" datafile` echo $result # This outputs the entire line, rather than the entry in the II column. awk "/^${word}/{print \$2}" datafile # This outputs only the entry in the II column, as expected. ######################### END SCRIPT ########################### -- "I care about my fellow man {) badri@valhalla.ee.rochester.edu Being taken for a ride, //\\ {ames,caip,cmcl2,columbia,cornell, I care that things start changing ///\\\ harvard,ll-xn,rutgers,topaz}! But there's no one on my side."-UB40 _||_ rochester!ur-valhalla!badri
chris@trantor.umd.edu (Chris Torek) (02/14/88)
In article <1159@valhalla.ee.rochester.edu> badri@valhalla.ee.rochester.edu (Badri Lokanathan) writes: >#!/bin/sh >word=$1 >result=`awk "/^${word}/{print \$2}" datafile` >echo $result ># This outputs the entire line, rather than the entry in the II column. Yes. Shell (sh, not csh) quoting is very easy to explain. "" quotes against file expansion; '' quotes against all expansion. Any time a line is evaluated, one level of quoting is removed. Backquotes evaluate the text inside the backquotes once. Hence "\$2" becomes $2 becomes nothing, and awk prints the whole line. Note that this means you can use backquotes inside backquotes: foo=`eval echo \`basename ...\`` This is not directly possible in csh. -- In-Real-Life: Chris Torek, Univ of MD Computer Science, +1 301 454 7163 (hiding out on trantor.umd.edu until mimsy is reassembled in its new home) Domain: chris@mimsy.umd.edu Path: not easily reachable
rupley@arizona.edu (John Rupley) (02/15/88)
In article <1159@valhalla.ee.rochester.edu>, badri@valhalla.ee.rochester.edu (Badri Lokanathan) writes: > Given a 2 column file of data in two columns, the columns separated by > blanks. A script is desired which, among other things, searches for a > word in the I column and outputs the corresponding entry in the II column. > There are several ways of doing this; I want to know why the following > inconsistency took place (I tried it on BSD4.3): > #!/bin/sh > word=$1 > result=`awk "/^${word}/{print \$2}" datafile` > echo $result > # This outputs the entire line, rather than the entry in the II column. > awk "/^${word}/{print \$2}" datafile > # This outputs only the entry in the II column, as expected. Unless you escape the escape: result=`awk "/^${word}/{print \\$2}" datafile` >> ^ << the shell substitutes a null string for $2, and print by default sends out the full line. To watch what happens, run under "sh -x". But you perhaps should avoid shell substitution inside an awk program. The following does what I think you want to do, more simply and less ambiguously: #!/bin/sh result=`awk '$1 == word {print $2}' word=$1 datafile` echo $result awk '$1 == word {print $2}' word=$1 datafile John Rupley uucp: ..{ihnp4 | hao!noao}!arizona!rupley!local internet: rupley!local@megaron.arizona.edu (H) 30 Calle Belleza, Tucson AZ 85716 - (602) 325-4533 (O) Dept. Biochemistry, Univ. Arizona, Tucson AZ 85721 - (602) 621-3929
wcs@ho95e.ATT.COM (Bill.Stewart) (02/26/88)
In article <1159@valhalla.ee.rochester.edu> badri@valhalla.ee.rochester.edu (Badri Lokanathan) writes: >Well, after several years of shell script writing, I thought I knew >everything about it, but I was wrong! >Given a 2 column file of data in two columns, the columns separated >by blanks. A script is desired which, among other things, searches >for a word in the I column and outputs the corresponding entry in >the II column. (I realize your question was about why your awk script didn't get passed the correct arguments.) But why use awk at all? It's very flexible, but much slower than egrep or sed. For this application, I'd recommend : Usage: myname pattern file egrep "^$1 " $2 | cut -f2 -d" " Even if you decide to use awk instead of cut to extract the second column (and presumably do summaries or other useful work), you'll speed the program up significantly by using egrep to reduce the amount of data that awk has to process. Alternatively, you can write it in shell (which won't be real fast either.) : Usage: myname pattern file pattern="$1"; shift cat $* | while read col1 col2 ; do if [ "$col1" = "$pattern" ] then echo $col2 fi done If your shell doesn't provide test ([) as a builtin, use case instead. -- # Thanks; # Bill Stewart, AT&T Bell Labs 2G218, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs
rupley@arizona.edu (John Rupley) (02/27/88)
In article <2004@ho95e.ATT.COM>, wcs@ho95e.ATT.COM (Bill.Stewart) writes: > In article <1159@valhalla.ee.rochester.edu> badri@valhalla.ee.rochester.edu (Badri Lokanathan) writes: > >Given a 2 column file of data in two columns, the columns separated > >by blanks. A script is desired which, among other things, searches ^ [please note the plural] > >for a word in the I column and outputs the corresponding entry in > >the II column. > > For this application, I'd recommend > : Usage: myname pattern file > egrep "^$1 " $2 | cut -f2 -d" " The above won't work, as it cuts at the first blank of a possible series of whitespace characters. The following fits the specification and can be adapted to include other types of whitespace, eg, tabs: egrep "^$1 " $2 | tr -s " " " " | cut -f2 -d" " This points up why awk is useful - it has fewer gotchas, of the above nit-picky kind, and the code is straightforward: awk '$1 == patone {print $2}' patone="$1" $2 > Even if you decide to use awk instead of cut to extract the second > column (and presumably do summaries or other useful work), you'll > speed the program up significantly by using egrep to reduce the > amount of data that awk has to process. Right! But only if the file to be searched is long. For short files, awk, with a single load and no pipes, is faster -- try it! Also, even for search of large files, if you prototype in awk, and subsequently, if execution time is a bore, recode (even in C (:-), you will probably save effort. > # Bill Stewart, AT&T Bell Labs 2G218, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs John Rupley uucp: ..{ihnp4 | hao!noao}!arizona!rupley!local internet: rupley!local@megaron.arizona.edu telex: 9103508679(JARJAR) (H) 30 Calle Belleza, Tucson AZ 85716 - (602) 325-4533 (O) Dept. Biochemistry, Univ. Arizona, Tucson AZ 85721 - (602) 621-3929