louxj@jacobs.cs.orst.edu (John W. Loux) (09/07/90)
Hi,
I'm having a problem with nawk (HP-UX 7.0, Series 300 model 50). The following
`shar' document contains the files used to create an example `relational
database system' described in chapter 4 of ``The AWK Programming Language'' by
Aho, Kernighan and Weinberger. I understand that the nawk of HP-UX and awk of
this book are supposed to be the same program. I am using the book, by the way,
because there is a troubling lack of documentation for the program amongst the
standard HP-UX system documentation. If I have erred in either of the above
observations, I trust that someone will rectify the situation.
The problem:
Upon running the qawk script (viz., qawk query), the program reports that all
lines in the ``relfile'' that begin with tab characters are invalid according
to the final conditional in the ``readrel'' function. After replacing all
occurrences of \t with literal tab characters, the ``relfile'' is read without
reported error. Conclusion, nawk does not understand \t in the way that the
book reports that it should. Not a major problem but a major inconvenience.
Now, rerunning the program causes it to fail to terminate or to generate any
sort of output. A little sleuthing reveals that the program never leaves the
while loop of the doquery function. The loop properly matches all occurrences
of variables and assigns 1's to the appropriately named elements of qattr.
Unfortunately, when the variables are exhausted, the match function still
returns 2 and thus the loop fails to terminate. I do believe that match is
the culprit here because match(s, /\$[A-Za-z]+/) returns 2 when s == " }".
Though I haven't investigated it, 2 is also the last valid match for the
example query, so maybe match is simply failing to return 0 for a non-match
and the 2 is just the result of the previous valid match. Either way, it's a
serious deficiency.
I am very concerned at this because nawk was to be at the core of my data
processing scripts and procedures. I'm pretty upset that I can't get it to work
properly. I hope someone can provide me with a solution to my problems or at
least insure that the powers that be at HP will fix the problems in the next
software revision.
Again, I am fully aware that I may be in error in my assumptions and
observations. Correcting my ignorance or error is also a valid solution
to the problem.
Thank you for any consideration that you can give to this.
John W. Loux
Solve and Integrate Corp.
PO Box 1928
Corvallis OR, 97339-1928
(503) 754-1207
louxj@cs.orst.edu
solvint!john@cs.orst.edu
# This is a shell archive. Remove anything before this line,
# then unpack it by saving it in a file and typing "sh file".
#
# Wrapped by John W. Loux <john@solvint> on Thu Sep 6 09:50:05 1990
#
# This archive contains:
# capitals countries qawk qawk.nawk
# query relfile
#
LANG=""; export LANG
PATH=/bin:/usr/bin:$PATH; export PATH
echo x - capitals
sed 's/^@//' >capitals <<'@EOF'
USSR Moscow
Canada Ottowa
China Beijing
USA Washington
Brazil Brasilia
India New Delhi
Mexico Mexico City
France Paris
Japan Tokyo
Germany Bonn
England London
@EOF
chmod 644 capitals
echo x - countries
sed 's/^@//' >countries <<'@EOF'
USSR 8649 275 Asia
Canada 3852 25 North America
China 3705 1032 Asia
USA 3615 237 North America
Brazil 3286 134 South America
India 1267 746 Asia
Mexico 762 78 North America
France 211 55 Europe
Japan 144 120 Asia
Germany 96 61 Europe
Englans 94 56 Europe
@EOF
chmod 644 countries
echo x - qawk
cat >qawk <<'@EOF'
nawk -f qawk.nawk $*
@EOF
chmod 755 qawk
echo x - qawk.nawk
cat >qawk.nawk <<'@EOF'
# qawk - awk relational database query processor
BEGIN { readrel("relfile") }
/./ { doquery($0) }
function readrel(f) {
while (getline <f > 0) # parse relfile
if ($0 ~ /^[A-Za-z]+ *:/) { # name:
gsub(/[^A-Za-z]+/, "", $0) # remove all but name
relname[++nrel] = $0
} else if ($0 ~ /^[ \t]*!/) # !command ...
cmd[nrel, ++ncmd[nrel]] = substr($0,index($0,"!")+1)
else if ($0 ~ /^[ \t]*[A-Za-z]+[ \t]*$/) # attribute
attr[nrel, $1] = ++nattr[nrel]
else if ($0 !~ /^[ \t]*$/) # not white space
print "bad line in relfile:", $0
}
function doquery(s,i,j) {
for (i in qattr) # clean up for next query
delete qattr[i]
query = s # put $names in query into qattr, without $
while (match(s, /\$[A-Za-z]+/)) {
qattr[substr(s, RSTART+1, RLENGTH-1)] = 1
s = substr(s, RSTART+RLENGTH+1)
}
for (i = 1; i <= nrel && !subset(qattr, attr, i); )
i++
if (i > nrel) # didn't find a table with all atributes
missing(qattr)
else { # table i contains attributes in query
for (j in qattr) # create awk program
gsub("\\$" j, "$" attr[i,j], query)
for (j = 1; j <= ncmd[i]; j++) # create table i
if (system(cmd[i, j]) != 0) {
print "command failed, query skipped\n", cmd[i,j]
return
}
awkcmd = sprintf("nawk -F'\t' '%s' %s", query, relname[i])
printf("query: %s\n", awkcmd) # for debugging
system(awkcmd)
}
}
function subset(q, a, r, i) { # is q a subset of a[r]?
for (i in q)
if (!((r,i) in a))
return 0
return 1
}
function missing(x,i) {
print "no table ocntains all of the following attributes:"
for (i in x)
print i
}
@EOF
chmod 644 qawk.nawk
echo x - query
cat >query <<'@EOF'
$continent ~ /Asia/ { print $country, $population }
@EOF
chmod 644 query
echo x - relfile
cat >relfile <<'@EOF'
countries:
country
area
population
continent
capitals:
country
capital
cc:
country
area
population
continent
capital
!sort countries >temp.countries
!sort capitals >temp.capitals
!join temp.countries temp.capitals >cc
@EOF
chmod 644 relfile
exit 0