[comp.sys.hp] Problem with HP-UX nawk on 9000 model 350

louxj@jacobs.cs.orst.edu (John W. Loux) (09/07/90)

Hi,

I'm having a problem with nawk (HP-UX 7.0, Series 300 model 50).  The following
`shar' document contains the files used to create an example `relational
database system' described in chapter 4 of ``The AWK Programming Language'' by
Aho, Kernighan and Weinberger.  I understand that the nawk of HP-UX and awk of
this book are supposed to be the same program.  I am using the book, by the way,
because there is a troubling lack of documentation for the program amongst the
standard HP-UX system documentation.  If I have erred in either of the above
observations, I trust that someone will rectify the situation.

The problem:

  Upon running the qawk script (viz., qawk query), the program reports that all
  lines in the ``relfile'' that begin with tab characters are invalid according
  to the final conditional in the ``readrel'' function.  After replacing all
  occurrences of \t with literal tab characters, the ``relfile'' is read without
  reported error.  Conclusion, nawk does not understand \t in the way that the
  book reports that it should.  Not a major problem but a major inconvenience.

  Now, rerunning the program causes it to fail to terminate or to generate any
  sort of output.  A little sleuthing reveals that the program never leaves the
  while loop of the doquery function.  The loop properly matches all occurrences
  of variables and assigns 1's to the appropriately named elements of qattr.
  Unfortunately, when the variables are exhausted, the match function still
  returns 2 and thus the loop fails to terminate.  I do believe that match is
  the culprit here because match(s, /\$[A-Za-z]+/) returns 2 when s == " }".

  Though I haven't investigated it, 2 is also the last valid match for the
  example query, so maybe match is simply failing to return 0 for a non-match
  and the 2 is just the result of the previous valid match.  Either way, it's a
  serious deficiency.


I am very concerned at this because nawk was to be at the core of my data
processing scripts and procedures.  I'm pretty upset that I can't get it to work
properly.  I hope someone can provide me with a solution to my problems or at
least insure that the powers that be at HP will fix the problems in the next
software revision.

Again, I am fully aware that I may be in error in my assumptions and
observations.  Correcting my ignorance or error is also a valid solution
to the problem.

Thank you for any consideration that you can give to this.


John W. Loux
Solve and Integrate Corp.
PO Box 1928
Corvallis OR, 97339-1928
(503) 754-1207
louxj@cs.orst.edu
solvint!john@cs.orst.edu





# This is a shell archive.  Remove anything before this line,
# then unpack it by saving it in a file and typing "sh file".
#
# Wrapped by John W. Loux <john@solvint> on Thu Sep  6 09:50:05 1990
#
# This archive contains:
#	capitals	countries	qawk		qawk.nawk	
#	query		relfile		
#

LANG=""; export LANG
PATH=/bin:/usr/bin:$PATH; export PATH

echo x - capitals
sed 's/^@//' >capitals <<'@EOF'
USSR	Moscow
Canada	Ottowa
China	Beijing
USA	Washington
Brazil	Brasilia
India	New Delhi
Mexico	Mexico City
France	Paris
Japan	Tokyo
Germany	Bonn
England	London
@EOF

chmod 644 capitals

echo x - countries
sed 's/^@//' >countries <<'@EOF'
USSR	8649	275	Asia
Canada	3852	25	North America
China	3705	1032	Asia
USA	3615	237	North America
Brazil	3286	134	South America
India	1267	746	Asia
Mexico	762	78	North America
France	211	55	Europe
Japan	144	120	Asia
Germany	96	61	Europe
Englans	94	56	Europe
@EOF

chmod 644 countries

echo x - qawk
cat >qawk <<'@EOF'
nawk -f qawk.nawk $*
@EOF

chmod 755 qawk

echo x - qawk.nawk
cat >qawk.nawk <<'@EOF'
# qawk - awk relational database query processor

BEGIN { readrel("relfile") }
/./   { doquery($0) }

function readrel(f) {
    while (getline <f > 0)               # parse relfile
	if ($0 ~ /^[A-Za-z]+ *:/) {      # name:
	    gsub(/[^A-Za-z]+/, "", $0)   # remove all but name
	    relname[++nrel] = $0
	} else if ($0 ~ /^[ \t]*!/)      # !command ...
	    cmd[nrel, ++ncmd[nrel]] = substr($0,index($0,"!")+1)
	else if ($0 ~ /^[ \t]*[A-Za-z]+[ \t]*$/) # attribute
	    attr[nrel, $1] = ++nattr[nrel]
	else if ($0 !~ /^[ \t]*$/)       # not white space
	    print "bad line in relfile:", $0
}

function doquery(s,i,j) {

    for (i in qattr)    # clean up for next query
	delete qattr[i]
    query = s           # put $names in query into qattr, without $

    while (match(s, /\$[A-Za-z]+/)) {
	qattr[substr(s, RSTART+1, RLENGTH-1)] = 1
	s = substr(s, RSTART+RLENGTH+1)
    }

    for (i = 1; i <= nrel && !subset(qattr, attr, i); )
        i++
    if (i > nrel)       # didn't find a table with all atributes
	missing(qattr)
    else {		# table i contains attributes in query
	for (j in qattr)    # create awk program
	    gsub("\\$" j, "$" attr[i,j], query)
	for (j = 1; j <= ncmd[i]; j++)    # create table i
	    if (system(cmd[i, j]) != 0) {
		print "command failed, query skipped\n", cmd[i,j]
		return
	    }
	awkcmd = sprintf("nawk -F'\t' '%s' %s", query, relname[i])
	printf("query: %s\n", awkcmd)    # for debugging
	system(awkcmd)
    }
}

function subset(q, a, r, i) {    # is q a subset of a[r]?
    for (i in q)
	if (!((r,i) in a))
	    return 0
    return 1
}

function missing(x,i) {
    print "no table ocntains all of the following attributes:"
    for (i in x)
      print i
}



@EOF

chmod 644 qawk.nawk

echo x - query
cat >query <<'@EOF'
$continent ~ /Asia/ { print $country, $population }
@EOF

chmod 644 query

echo x - relfile
cat >relfile <<'@EOF'
countries:
	country
	area
	population
	continent
capitals:
	country
	capital
cc:
	country
	area
	population
	continent
	capital
	!sort countries >temp.countries
	!sort capitals >temp.capitals
	!join temp.countries temp.capitals >cc
@EOF

chmod 644 relfile

exit 0