louxj@jacobs.cs.orst.edu (John W. Loux) (09/12/90)
Hi, I'm having a problem with nawk (HP-UX 7.0, Series 300 model 50). The following `shar' document contains the files used to create an example `relational database system' described in chapter 4 of ``The AWK Programming Language'' by Aho, Kernighan and Weinberger. I understand that the nawk of HP-UX and awk of this book are supposed to be the same program. I am using the book, by the way, because there is a troubling lack of documentation for the program amongst the standard HP-UX system documentation. If I have erred in either of the above observations, I trust that someone will rectify the situation. The problem: Upon running the qawk script (viz., qawk query), the program reports that all lines in the ``relfile'' that begin with tab characters are invalid according to the final conditional in the ``readrel'' function. After replacing all occurrences of \t with literal tab characters, the ``relfile'' is read without reported error. Conclusion, nawk does not understand \t in the way that the book reports that it should. Not a major problem but a major inconvenience. Now, rerunning the program causes it to fail to terminate or to generate any sort of output. A little sleuthing reveals that the program never leaves the while loop of the doquery function. The loop properly matches all occurrences of variables and assigns 1's to the appropriately named elements of qattr. Unfortunately, when the variables are exhausted, the match function still returns 2 and thus the loop fails to terminate. I do believe that match is the culprit here because match(s, /\$[A-Za-z]+/) returns 2 when s == " }". Though I haven't investigated it, 2 is also the last valid match for the example query, so maybe match is simply failing to return 0 for a non-match and the 2 is just the result of the previous valid match. Either way, it's a serious deficiency. I am very concerned at this because nawk was to be at the core of my data processing scripts and procedures. I'm pretty upset that I can't get it to work properly. I hope someone can provide me with a solution to my problems or at least insure that the powers that be at HP will fix the problems in the next software revision. Again, I am fully aware that I may be in error in my assumptions and observations. Correcting my ignorance or error is also a valid solution to the problem. Thank you for any consideration that you can give to this. John W. Loux Solve and Integrate Corp. PO Box 1928 Corvallis OR, 97339-1928 (503) 754-1207 louxj@cs.orst.edu solvint!john@cs.orst.edu # This is a shell archive. Remove anything before this line, # then unpack it by saving it in a file and typing "sh file". # # Wrapped by John W. Loux <john@solvint> on Thu Sep 6 09:50:05 1990 # # This archive contains: # capitals countries qawk qawk.nawk # query relfile # LANG=""; export LANG PATH=/bin:/usr/bin:$PATH; export PATH echo x - capitals sed 's/^@//' >capitals <<'@EOF' USSR Moscow Canada Ottowa China Beijing USA Washington Brazil Brasilia India New Delhi Mexico Mexico City France Paris Japan Tokyo Germany Bonn England London @EOF chmod 644 capitals echo x - countries sed 's/^@//' >countries <<'@EOF' USSR 8649 275 Asia Canada 3852 25 North America China 3705 1032 Asia USA 3615 237 North America Brazil 3286 134 South America India 1267 746 Asia Mexico 762 78 North America France 211 55 Europe Japan 144 120 Asia Germany 96 61 Europe Englans 94 56 Europe @EOF chmod 644 countries echo x - qawk cat >qawk <<'@EOF' nawk -f qawk.nawk $* @EOF chmod 755 qawk echo x - qawk.nawk cat >qawk.nawk <<'@EOF' # qawk - awk relational database query processor BEGIN { readrel("relfile") } /./ { doquery($0) } function readrel(f) { while (getline <f > 0) # parse relfile if ($0 ~ /^[A-Za-z]+ *:/) { # name: gsub(/[^A-Za-z]+/, "", $0) # remove all but name relname[++nrel] = $0 } else if ($0 ~ /^[ \t]*!/) # !command ... cmd[nrel, ++ncmd[nrel]] = substr($0,index($0,"!")+1) else if ($0 ~ /^[ \t]*[A-Za-z]+[ \t]*$/) # attribute attr[nrel, $1] = ++nattr[nrel] else if ($0 !~ /^[ \t]*$/) # not white space print "bad line in relfile:", $0 } function doquery(s,i,j) { for (i in qattr) # clean up for next query delete qattr[i] query = s # put $names in query into qattr, without $ while (match(s, /\$[A-Za-z]+/)) { qattr[substr(s, RSTART+1, RLENGTH-1)] = 1 s = substr(s, RSTART+RLENGTH+1) } for (i = 1; i <= nrel && !subset(qattr, attr, i); ) i++ if (i > nrel) # didn't find a table with all atributes missing(qattr) else { # table i contains attributes in query for (j in qattr) # create awk program gsub("\\$" j, "$" attr[i,j], query) for (j = 1; j <= ncmd[i]; j++) # create table i if (system(cmd[i, j]) != 0) { print "command failed, query skipped\n", cmd[i,j] return } awkcmd = sprintf("nawk -F'\t' '%s' %s", query, relname[i]) printf("query: %s\n", awkcmd) # for debugging system(awkcmd) } } function subset(q, a, r, i) { # is q a subset of a[r]? for (i in q) if (!((r,i) in a)) return 0 return 1 } function missing(x,i) { print "no table ocntains all of the following attributes:" for (i in x) print i } @EOF chmod 644 qawk.nawk echo x - query cat >query <<'@EOF' $continent ~ /Asia/ { print $country, $population } @EOF chmod 644 query echo x - relfile cat >relfile <<'@EOF' countries: country area population continent capitals: country capital cc: country area population continent capital !sort countries >temp.countries !sort capitals >temp.capitals !join temp.countries temp.capitals >cc @EOF chmod 644 relfile exit 0