[comp.sources.amiga] v02i011: bawk sources

doc@s.cc.purdue.edu (Craig Norborg) (08/09/87)
The enclosed shar archive holds the first part of the sources for bawk.
I got the original files from Fish disk 65. There are substantial changes.
Don't hesitate to mail me if there are any problems.
					Johan Widen
					jw@sics.se

#	This is a shell archive.
#	Remove everything above and including the cut line.
#	Then run the rest of the file through sh.
#----cut here-----cut here-----cut here-----cut here----#
#!/bin/sh
# Xshar: Extended Shell Archiver.
# This is part  1 out of  2.
# This archive created: Sat Aug  8 19:32:40 1987
# By: Craig Norborg (Purdue University Computing Center)
#	Run the following text with /bin/sh to create:
#	Makefile
#	README
#	bawk.doc
#	bawk.h
#	bawkparse.c
#	bawksym.c
#	example1
#	example3
#	link.cmd
#	tst2
cat << \SHAR_EOF > Makefile
CC	= lc
CFLAGS	= 
OBJ =		bawk.o bawkact.o bawkdo.o bawkpat.o bawksym.o bawkparse.o


.SUFFIXES : 
.SUFFIXES : .o .c

.c.o :
	$(CC) $(CFLAGS) $*.c

bawk :		$(OBJ)
		blink with link.cmd

$(OBJ) :	bawk.h
SHAR_EOF
cat << \SHAR_EOF > README
Changes as of 19-JUL-1987

I got this code from Fish disk number 65. Although the program is well
written it did not run on the Amiga. I set out to get it running and have
by now, as usual, spent far to much time on it. I release this update in the
hope that someone else will do some work on it.

I going to say a lot of negative things about Bawk here. This should not
taken as critisism against the original author. As I said above the program
was well written and it was fun to work with it. The characteristics of
the original program is 'almost correct, easy to understand, but slow'.
I much prefer to work with such code as compared to 'fast but buggy' code.

Although you may not believe it at first, there is really a lot of
functionality in this program. However; Bawk is not awk. There is a lot of
functionality lacking, and Bawk is also slower. You can be the one to
change all that... 8-)

Here are the some differences between awk and Bawk.

Regular expressions in Bawk are delimited by '@', not '/':
	@[Ff]oo@

The function 'print' is not implemented. You have to get by with 'printf'.
The reason that print is not yet implemented is that automatic
conversion between string values and other (e.g. numerical) values is not
yet implemented.

Assignment between arrays is not automatic. You have to use strcpy:
	Awk:	$1 = "foo"
	Bawk:	strcpy($1,"foo")

Redirection (printf "%s", $0 >file) is not implemented.

To match a field in awk you can say
	$1 ~ /[Ff]oo/
in Bawk you say
	match($1,@[Ff]oo@)

Arrays in Bawk are not associative. It would probably be a good idea to
remove the declarations from Bawk and to make type handling and array
handling more like awk.


Some minor changes:
Bawk can now take a command line pattern.
Here are three ways of invoking awk:

$ bawk @[Ff]oo@ file
$ echo >xxx @[Ff]oo@
$ bawk -f xxx file
$ bawk - file
@[Ff]oo@
$ 

The braindamaged parsing of command line arguments on the Amiga forced me
to define an alternative string delimiter. Strings can now be delimited
by '`' (`a string`) as well as '"' ("a string"). This behaviour is
optional. If you do not like it then undefine QUOTE_STRING_HACK in bawk.h.

I have used the Lattice C compiler. Conversion to manx with 32 bit ints
should be easy. The code assumes that sizeof(int) == sizeof(char *).

I used Fred Fish's dbug package to develop the current version. The name
dbug is a bit of a misnomer, 'trace' is a more appropriate name. dbug
implements tracing in an orderly manner. Very useful. The package is
available on Fish disk 41. An older version is on disk 2.

The trace code is conditionally compiled in depending on the definition
	DBUG_OFF
To compile with tracing, comment out the definition of DBUG_OFF in bawk.h.
You can then invoke tracing and dbug printing with
	bawk -\#t:d action file
Select printing only with
	bawk -\#d action file
Note that bawk will run slower and be a bit larger if you compile with
tracing enabled.

			Johan Widen
			USENET: jw@sics.se
SHAR_EOF
cat << \SHAR_EOF > bawk.doc
NAME

	bawk - text processor

SYNOPSIS

	bawk rules [file] ...
	bawk -f rulefile file ...

DESCRIPTION

	Bawk is a text processing program that searches files for
	specific patterns and performs "actions" for every occurrance
	of these patterns.  The patterns can be "regular expressions"
	as used in the UNIX "ex" editor.  The actions are expressed
	using a subset of the "C" language.

	By default Bawk will interpret the first argument as a rule.
	You can force bawk to take the rules from a file by using the
	-f option. The following arguments are taken to be the names of
	text files on which the rules are to be applied. The special file
	name "-" may also be used anywhere on the command line to take
	input from the standard input device.

	The command:

		bawk - prog.c - prog.h

	would read the patterns and actions rules from the standard
	input, then apply them to the files "prog.c", the standard
	input and "prog.h" in that order.

	The general format of a rules file is:

		<pattern> { <action> }
		<pattern> { <action> }
		...

	There may be any number of these <pattern> { <action> }
	sequences in the rules file.  Bawk reads a line of input from
	the current input file and applies every <pattern> { <action> }
	in sequence to the line.
	
	If the <pattern> corresponding to any { <action> } is missing,
	the action is applied to every line of input.  The default
	{ <action> } is to print the matched input line.

PATTERNS

	The <pattern>'s may consist of any valid C expression.  If the
	<pattern> consists of two expressions separated by a comma, it
	is taken to be a range and the <action> is performed on all
	lines of input that match the range.  <pattern>'s may contain
	"regular expressions" delimited by an '@' symbol.  Regular
	expressions can be thought of as a generalized "wildcard"
	string matching mechanism, similar to that used by many
	operating systems to specify file names.  Regular expressions
	may contain any of the following characters:

		x	An ordinary character (not mentioned below)
			matches that character.
		'\'	The backslash quotes any character.
			"\$" matches a dollar-sign.
		'^'	A circumflex at the beginning of an expression
			matches the beginning of a line.
		'$'	A dollar-sign at the end of an expression
			matches the end of a line.
		'.'	A period matches any single character except
			newline.
		':x'	A colon matches a class of characters described
			by the character following it:
		':a'	":a" matches any alphabetic;
		':d'	":d" matches digits;
		':n'	":n" matches alphanumerics;
		': '	": " matches spaces, tabs, and other control
			characters, such as newline.
		'*'	An expression followed by an asterisk matches
			zero or more occurrances of that expression:
			"fo*" matches "f", "fo", "foo", "fooo", etc.
		'+'	An expression followed by a plus sign matches
			one or more occurrances of that expression:
			"fo+" matches "fo", "foo", "fooo", etc.
		'-'	An expression followed by a minus sign
			optionally matches the expression.
		'[]'	A string enclosed in square brackets matches
			any single character in that string, but no
			others.  If the first character in the string
			is a circumflex, the expression matches any
			character except newline and the characters in
			the string.  For example, "[xyz]" matches "xx"
			and "zyx", while "[^xyz]" matches "abc" but not
			"axb".  A range of characters may be specified
			by two characters separated by "-".  Note that,
			[a-z] matches alphabetics, while [z-a] never
			matches.

	For example, the following rules file would print every line
	that contained a valid C identifier:

		@[a-zA-Z][a-zA-Z0-9]@

	And this rules file would print all lines between and including
	the ones that contained the word "START" and "END":

		@START@, @END@

ACTIONS

	Actions are expressed as a subset of the C language.  All
	variables are global and default to int's if not formally
	declared.  Variable declarations may appear anywhere within
	an action.  Only char's and int's and pointers and arrays of
	char and int are allowed.  Bawk allows only decimal integer
	constants to be used - no hex (0xnn) or octal (0nn). String
	and character constants may contain all of the special C
	escapes (\n, \r, etc.).

	Bawk supports the "if", "else", "while" and "break" flow of
	control constructs, which behave exactly as in C.

	Also supported are the following unary and binary operators,
	listed in order from highest to lowest precedence:

		operator           type    associativity
		() []              unary   left to right
		! ~ ++ -- - * &    unary   right to left
		* / %              binary  left to right
		+ -                binary  left to right
		<< >>              binary  left to right
		< <= > >=          binary  left to right
		== !=              binary  left to right
		&                  binary  left to right
		^                  binary  left to right
		|                  binary  left to right
		&&                 binary  left to right
		||                 binary  left to right
		=                  binary  right to left

	Comments are introduced by a '#' symbol and are terminated by
	the first newline character.  The standard "/*" and "*/"
	comment delimiters are not supported and will result in a
	syntax error.

FIELDS

	When bawk reads a line from the current input file, the
	record is automatically separated into "fields".  A field is
	simply a string of consecutive characters delimited by either
	the beginning or end of line, or a "field separator" character
	Initially, the field separators are the space and tab character.
	The special unary operator '$' is used to reference one of the
	fields in the current input record (line).  The fields are
	numbered sequentially starting at 1.  The expression "$0"
	references the entire input line.

	Similarly, the "record separator" is used to determine the end
	of an input "line", initially the newline character.
	The field and record separators may be changed programatically
	by one of the actions and will remain in effect until changed
	again.

	If the record separator is empty then an empty line will be taken
	as record separator and tab, space and newline will be used as
	field separators.

	Fields behave exactly like strings; and can be used in the same
	context as a character array.  These "arrays" can be considered
	to have been declared as:

		char ($n)[ 200 ];

	In other words, they are 200 bytes long.  Notice that the
	parentheses are necessary because the operators [] and $
	associate from right to left; without them, the statement
	would have parsed as:

		char $(1[ 200 ]);

	which is obviously ridiculous.

	If the contents of one of these field arrays is altered, the
	"$0" field will reflect this change.  For example, this
	expression:

		*$4 = 'A';

	will change the first character of the fourth field to an upper-
	case letter 'A'.  Then, when the following input line:

		120 PRINT "Name         address        Zip"

	is processed, it would be printed as:

		120 PRINT "Name         Address        Zip"

	Fields may also be modified with the strcpy() function (see
	below).  For example, the expression:

		strcpy( $4, "Addr." );

	applied to the same line above would yield:

		120 PRINT "Name         Addr.        Zip"

PREDEFINED VARIABLES

	The following variables are pre-defined:

		FS		Field separator (see below).
		RS		Record separator (see below also).
		NF		Number of fields in current input
				record (line).
		NR		Number of records processed thus far.
		FILENAME	Name of current input file.
		BEGIN		A special <pattern> that matches the
				beginning of input text, before the
				first record is read.
		END		A special <pattern> that matches the
				end of input text, after the last
				record has been read.

	Bawk also provides some useful builtin functions for string
	manipulation and printing:

		printf(arg..)	Exactly the printf() function from C.
		getline()	Reads the next record from the current
				input file and returns 0 on end of file.
		nextfile()	Closes out the current input file and
				begins processing the next file in the
				list (if any).
		strlen(s)	Returns the length of its string argument.
		strcpy(s,t)	Copies the string "t" to the string "s".
		strcmp(s,t)	Compares the "s" to "t" and returns 0 if
				they match.
		toupper(c)	Returns its character argument converted
				to upper-case.
		tolower(c)	Returns its character argument converted
				to lower-case.
		match(s,@re@)	Compares the string "s" to the regular
				expression "re" and returns the number
				of matches found (zero if none).

EXAMPLES

	The following rules file will scan a C program, counting the
	number of mismatched parentheses, brackets, and braces.

		@[()\[\]{}]@
		{
			parens = parens + match( $0, @(@ );
			parens = parens - match( $0, @)@ );
			bracks = bracks + match( $0, @\[@ );
			bracks = bracks - match( $0, @]@ );
			braces = braces + match( $0, @{@ );
			braces = braces - match( $0, @}@ );
		}
		END { printf("parens=%d, brackets=%d, braces=%d\n",
				parens, bracks, braces );
		}

	This program will capitalize the first word in every sentence of
	a document:

		BEGIN
		{
			strcpy(RS,".");  # set record separator to a period
		}
		{
			if ( match( $1, @^[a-z]@ ) )
				*$1 = toupper( *$1 );
			printf( "%s\n", $0 );
		}

LIMITATIONS

	Bawk was originally written in BDS C, but every attempt was made
	to keep the code as portable as possible.  The program should
	be compilable with any "standard" C compiler.  On CP/M systems
	compiled with BDS C, bawk takes up about 24K.

	An input record may be no longer than 200 characters. If longer
	records are encountered, they terminate prematurely and the
	next record starts where the previous one was hacked off.

	A single pattern or action statement may be no longer than about
	4K characters, excluding comments and whitespace.  Since the
	program is semi-compiled the tokenized version will probably
	wind up being smaller than the source code, so the 4K figure is
	only approximate.

AUTHOR

	Bob Brodt
	486 Linden Ave.
	Bogota, NJ 07603

ACKNOWLEDGEMENTS

	The concept for bawk (and 3/4 of the name!) was taken from
	the program "awk" written by Afred V. Aho, Brian W. Kernighan
	and Peter J. Weinberger.  My apologies for any irreverences.

	The regular expression compiler/parser was borrowed from a
	program called "grep" and has been highly modified.  Grep is
	distributed by the DEC Users Society (DECUS) and is Copyright
	(C) 1980 by DECUS.  The author acknowledges DECUS with a nod of
	thanks for giving their general permission and okey-dokey to
	copy or modify the grep program.

	UNIX is a trademark of AT&T Bell Labs.
SHAR_EOF
cat << \SHAR_EOF > bawk.h
/*
 * Bawk constants and variable declarations.
 */
#include <stdlib.h>
#include <ctype.h>
/* #define ANSI_OFF */
#define DBUG_OFF
#include <dbug.h>

#ifdef BDS_C
#define EXTERN /* */
#else

#ifdef MAIN
#define EXTERN /* */
#else
#define EXTERN extern
#endif

#endif

/*
 * If QUOTE_STRING_HACK is defined then Bawk programs passed on the
 * command line may delimit strings with either `grave accent` or
 * "double quotes".
 */
#define QUOTE_STRING_HACK

/*
 * Table and buffer sizes
 */
#define MAXLINELEN	200	/* longest input line */
#define MAXWORDS	(MAXLINELEN/2)	/* max # of words in a line */
#define MAXWORKBUFLEN	4096	/* longest action or regular expression */
#define MAXVARTABSZ	50	/* max # of symbols */
#define MAXVARLEN	10	/* symbol name length */
#define MAXSTACKSZ	40	/* max value stack length (for expressions) */


/**********************************************************
 * Current Input File variables                           *
 **********************************************************/
/*
 * Current Input File pointer:
 */
#ifdef BDS_C
EXTERN char *Fileptr, Curfbuf[ BUFSIZ ];
#else
EXTERN FILE *Fileptr;
#endif
EXTERN char *Filename;		/* current input file name */
EXTERN int Linecount;		/* current input line number */
EXTERN int Recordcount;		/* record count */
/*
 * Working buffers.
 */
EXTERN char Linebuf[ MAXLINELEN+1 ];	/* current input line buffer */
EXTERN char *Fields[ MAXWORDS+1 ];	/* pointers to the words in Linebuf */
EXTERN int Fieldcount;			/* and the # of words */
EXTERN char Workbuf[ MAXWORKBUFLEN+1 ];	/* work area for C action and */
					/* regular expression parsers */

/**********************************************************
 * Regular Expression Parser variables                    *
 **********************************************************/
/*
 * Tokens:
 */
#define CHAR	1
#define BOL	2
#define EOL	3
#define ANY	4
#define CLASS	5
#define NCLASS	6
#define STAR	7
#define PLUS	8
#define MINUS	9
#define ALPHA	10
#define DIGIT	11
#define NALPHA	12
#define PUNCT	13
#define RANGE	14
#define ENDPAT	15


/**********************************************************
 * C Actions Interpreter variables                        *
 **********************************************************/
/*
 * Tokens:
 */
#define T_STRING	16	/* primaries: */
#define T_DOLLAR	17
#define T_REGEXP	18
#define T_REGEXP_ARG	19
#define T_CONSTANT	20
#define T_VARIABLE	21
#define T_FUNCTION	22
#define T_SEMICOLON	23	/* punctuation */
#define T_EOF		24
#define T_LBRACE	25
#define T_RBRACE	26
#define T_LPAREN	27
#define T_RPAREN	28
#define T_LBRACKET	29
#define T_RBRACKET	30
#define T_COMMA		31
#define T_ASSIGN	32	/* operators: */
#define T_STAR		33	/* *foo */
#define T_MUL		34
#define T_DIV		35
#define T_MOD		36
#define T_ADD		37
#define T_UMINUS	38	/* -foo */
#define T_SUB		39
#define T_SHL		40
#define T_SHR		41
#define T_LT		42
#define T_LE		43
#define T_GT		44
#define T_GE		45
#define T_EQ		46
#define T_NE		47
#define T_NOT		48
#define T_ADDROF	49	/* &foo */
#define T_AND		50
#define T_XOR		51
#define T_OR		52
#define T_LNOT		53
#define T_LAND		54
#define T_LOR		55
#define T_INCR		56
#define T_DECR		57
#define T_POSTINCR	58	/* foo++ */
#define T_POSTDECR	59	/* foo-- */
#define T_IF		60	/* keywords: */
#define T_ELSE		61
#define T_WHILE		62
#define T_BREAK		63
#define T_CHAR		64
#define T_INT		65
#define T_BEGIN		66
#define T_END		67
#define T_NF		68
#define T_NR		69
#define T_FS		70
#define T_RS		71
#define T_FILENAME	72
#define T_STATEMENT	73
#define T_DECLARE	74	/* char foo */
#define T_ARRAY_DECLARE	75	/* char foo[5] */

#define MAX_TOKEN	T_ARRAY_DECLARE

#ifndef DBUG_OFF
extern char *token_name[];
#endif

#define PATTERN	'P'	/* indicates C statement is within a pattern */
#define ACTION	'A'	/* indicates C statement is within an action */

/*
 * Symbol table
 */

struct variable {
	char	vname[ MAXVARLEN ];
	char	vclass;
	char	vsize;
	int	vlen;
	char	*vptr;
};
#define VARIABLE struct variable
EXTERN VARIABLE Vartab[ MAXVARTABSZ ], *Nextvar;
/* A variable may be redeclared. Is this a feature? Should we have block */
/* scopeing? vardecl stores the redeclaration info. */
struct vardecl {
	VARIABLE *variable;
	char	vclass;
	char	vsize;
};
#define VARDECL struct vardecl

/*
 * Symbol Table values
 */
#define ACTUAL		0
#define LVALUE		1
#define BYTE		1
#define WORD		(sizeof(char *))

/*
 * Value stack
 */
union datum {
	int	ival;
	char 	*dptr;
	char	**ptrptr;
};
#define DATUM union datum
struct item {
	char	class;
	char	lvalue;
	char	size;
	DATUM	value;
};
#define ITEM struct item
EXTERN ITEM Stackbtm[ MAXSTACKSZ ], *Stackptr, *Stacktop;
/*
 * parse tree
 */
struct expr_node {
	struct expr_node *left;
	struct expr_node *right;
	char operator;
};
#define EXPR_NODE struct expr_node
/*
 * Miscellaneous
 */
EXTERN char *Actptr;	/* pointer into Workbuf during compilation */
EXTERN char Token;	/* current input token */
EXTERN DATUM Value;	/* and its value */
EXTERN char Saw_break;	/* set when break stmt seen */
EXTERN char Where;	/* indicates whether C stmt is a PATTERN or ACTION */
EXTERN char Fieldsep[128];	/* field seperator */
EXTERN char Recordsep[128];	/* record seperator */
EXTERN EXPR_NODE *Beginact;	/* BEGINning of input actions */
EXTERN EXPR_NODE *Endact;	/* END of input actions */

/**********************************************************
 * Rules structure                                        *
 **********************************************************/
struct rule {
	struct {
		EXPR_NODE *start;/* C statements that match pattern start */
		EXPR_NODE *stop;/* C statements that match pattern end */
		char startseen;	/* set if both a start and stop pattern */
				/* given and if an input line matched the */
				/* start pattern */
	} pattern;
	EXPR_NODE *action;	/* quasi-C statements parse tree */
	struct rule *nextrule;	/* pointer to next rule */
};
#define RULE struct rule
EXTERN RULE *Rules,		/* rule structures linked list head */
	*Rulep;			/* working pointer */


/**********************************************************
 * Miscellaneous                                          *
 **********************************************************/
/*
 * Error exit values (returned to command shell)
 */
#define USAGE_ERROR	1	/* error in invokation */
#define FILE_ERROR	2	/* file not found errors */
#define RECORD_ERROR	3	/* input record too long */
#define RE_ERROR	4	/* bad regular expression */
#define ACT_ERROR	5	/* bad C action stmt */
#define MEM_ERROR	6	/* out of memory errors */
/*
 * Functions that return something special:
 */
#ifdef ANSI_OFF
extern EXPR_NODE *act_compile();
extern VARIABLE *addvar();
extern void assignment();
extern char *cclass();
extern void compile();
extern EXPR_NODE *decl_parse();
extern EXPR_NODE *declist_parse();
extern void doaction();
extern int dopattern();
extern void endfile();
extern void error();
extern EXPR_NODE *expr_parse(), *expr_left_to_right_parse();
extern int fetchint();
extern char *fetchptr();
extern VARIABLE *findvar();
extern void function();
extern int getcharacter();
extern EXPR_NODE *get_expr_node();
extern int getline();
extern char *getmemory();
extern char *get_clear_memory();
extern char getoken();
extern void init_pop_array();
extern int instr();
extern int isfunction();
extern int iskeyword();
extern int match();
extern void newfile ();
extern int parse();
extern EXPR_NODE *pat_compile();
extern char *pmatch();
extern int pop();
extern int popint();
extern void postincdec();
extern void preincdec();
extern EXPR_NODE *primary_parse();
extern void process();
extern void push();
extern void pushint();
extern int re_compile();
extern void stmt_lex();
extern EXPR_NODE *stmt_parse();
extern void storeint();
extern void storeptr();
extern char *str_compile();
extern void syntaxerror();
extern int ungetcharacter();
extern void unparse();
extern void usage();
extern void walk_tree();
#else ANSI_OFF
extern EXPR_NODE *act_compile(char *);
extern VARIABLE *addvar(char *);
extern void assignment(void);
extern char *cclass(char *);
extern void compile(void);
extern EXPR_NODE *decl_parse(int);
extern EXPR_NODE *declist_parse(void);
extern void doaction(EXPR_NODE *);
extern int dopattern(EXPR_NODE *);
extern void endfile(void);
extern void error(char *,int);
extern EXPR_NODE *expr_parse(void), *expr_left_to_right_parse(char);
extern int fetchint(char *);
extern char *fetchptr(char *);
extern VARIABLE *findvar(char *);
extern void function(int,EXPR_NODE *);
extern int getcharacter(void);
extern EXPR_NODE *get_expr_node(char);
extern int getline(void);
extern char *getmemory(unsigned);
extern char *get_clear_memory(unsigned);
extern char getoken(void);
extern void init_pop_array(void);
extern int instr(char,char *);
extern int isfunction(char *);
extern int iskeyword(char *);
extern int match(char *,char *);
extern void newfile (char *);
extern int parse(char *,char **,char *);
extern EXPR_NODE *pat_compile(char *);
extern char *pmatch(char *,char *,char *);
extern int pop(void);
extern int popint(void);
extern void postincdec(int);
extern void preincdec(int);
extern EXPR_NODE *primary_parse(void);
extern void process(void);
extern void push(char,char,char,DATUM *);
extern void pushint(int);
extern int re_compile(char *);
extern void stmt_lex(char *);
extern EXPR_NODE *stmt_parse(void);
extern void storeint(char *,int);
extern void storeptr(char *,char *);
extern char *str_compile(char *,char);
extern void syntaxerror(void);
extern int ungetcharacter(char);
extern void unparse(char **,int,char *,char *);
extern void usage(void);
extern void walk_tree(EXPR_NODE *);
#endif ANSI_OFF
SHAR_EOF
cat << \SHAR_EOF > bawkparse.c
X/*
X * Bawk C actions parser
X */
X#include <stdio.h>
X#include "bawk.h"
X
Xstatic char operator_strength[] = {
X0,  /* 0 */
X0,  /* CHAR */
X0,  /* BOL */
X0,  /* EOL */
X0,  /* ANY */
X0,  /* CLASS */
X0,  /* NCLASS */
X0,  /* STAR */
X0,  /* PLUS */
X0,  /* MINUS */
X0,  /* ALPHA */
X0,  /* DIGIT */
X0,  /* NALPHA */
X0,  /* PUNCT */
X0,  /* RANGE */
X0,  /* ENDPAT */
X0,  /* T_STRING */
X0,  /* T_DOLLAR */
X0,  /* T_REGEXP */
X0,  /* T_REGEXP_ARG */
X0,  /* T_CONSTANT */
X0,  /* T_VARIABLE */
X0,  /* T_FUNCTION */
X0,  /* T_SEMICOLON */
X0,  /* T_EOF */
X0,  /* T_LBRACE */
X0,  /* T_RBRACE */
X0,  /* T_LPAREN */
X0,  /* T_RPAREN */
X0,  /* T_LBRACKET */
X0,  /* T_RBRACKET */
X0,  /* T_COMMA */
X1,  /* T_ASSIGN */
X0,  /* T_STAR */
X11, /* T_MUL */
X11, /* T_DIV */
X11, /* T_MOD */
X10, /* T_ADD */
X0,  /* T_UMINUS */
X10, /* T_SUB */
X9,  /* T_SHL */
X9,  /* T_SHR */
X8,  /* T_LT */
X8,  /* T_LE */
X8,  /* T_GT */
X8,  /* T_GE */
X7,  /* T_EQ */
X7,  /* T_NE */
X0,  /* T_NOT */
X0,  /* T_ADDROF */
X6,  /* T_AND */
X5,  /* T_XOR */
X4,  /* T_OR */
X0,  /* T_LNOT */
X3,  /* T_LAND */
X2,  /* T_LOR */
X0,  /* T_INCR */
X0,  /* T_DECR */
X0,  /* T_POSTINCR */
X0,  /* T_POSTDECR */
X0,  /* T_IF */
X0,  /* T_ELSE */
X0,  /* T_WHILE */
X0,  /* T_BREAK */
X0,  /* T_CHAR */
X0,  /* T_INT */
X0,  /* T_BEGIN */
X0,  /* T_END */
X0,  /* T_NF */
X0,  /* T_NR */
X0,  /* T_FS */
X0,  /* T_RS */
X0,  /* T_FILENAME */
X0,  /* T_STATEMENT */
X0,  /* T_DECLARE */
X0   /* T_ARRAY_DECLARE */
X};
X
XEXPR_NODE *stmt_parse()
X{
X	/*
X	 * Parse a statement.
X	 */
X	register EXPR_NODE *root = NULL, *end_pointer, *tmp;
X
X	DBUG_ENTER("stmt_parse");
X	switch ( Token )
X	{
X	case T_EOF:
X		break;
X	case T_CHAR:
X	case T_INT:
X		root = declist_parse();
X		break;
X	case T_LBRACE:
X		/*
X		 * parse a compound statement
X		 */
X		getoken();
X		while ( Token != T_RBRACE )
X		{
X			tmp = get_expr_node((char) T_STATEMENT);
X			if(!root) {
X				root = end_pointer = tmp;
X			} else {
X				end_pointer->right = tmp;
X				end_pointer = tmp;
X			}
X			end_pointer->left = stmt_parse();
X		}
X		if ( Token==T_RBRACE )
X			getoken();
X		break;
X	case T_IF:
X		/*
X		 * parse an "if-else" statement
X		 */
X		if ( getoken() != T_LPAREN )
X			syntaxerror();
X		getoken();
X		root = get_expr_node((char) T_IF);
X		root->left = end_pointer = get_expr_node((char) T_IF);
X		end_pointer->left = expr_parse();
X		if ( Token!=T_RPAREN )
X			syntaxerror();
X		getoken();
X		end_pointer->right = stmt_parse();
X		if ( Token==T_ELSE )
X		{
X			getoken();
X			root->right = stmt_parse();
X		}
X		break;
X	case T_WHILE:
X		/*
X		 * parse a "while" statement
X		 */
X		root = get_expr_node((char) T_WHILE);
X		if ( getoken() != T_LPAREN )
X			syntaxerror();
X
X		getoken();
X		root->left = expr_parse();
X		if ( Token!=T_RPAREN )
X			syntaxerror();
X
X		getoken();
X		root->right = stmt_parse();
X		break;
X	case T_BREAK:
X		/*
X		 * parse a "break" statement
X		 */
X		root = get_expr_node((char) T_BREAK);
X		getoken();
X		break;
X	case T_SEMICOLON:
X		break;
X	default:
X		root = expr_parse();
X	}
X
X	if ( Token==T_SEMICOLON )
X		getoken();
X	DBUG_RETURN(root);
X}	
X
XEXPR_NODE *expr_parse()
X{
X	register EXPR_NODE *root, *tmp;
X	register char strength;
X
X	DBUG_ENTER("expr_parse");
X	strength = operator_strength[T_ASSIGN];
X	root = expr_left_to_right_parse(strength);
X	if(strength == operator_strength[Token])
X	{
X		/* assignments are grouped right to left */
X		tmp = get_expr_node(Token);
X		tmp->left = root;
X		root = tmp;
X		getoken();
X		root->right = expr_parse();
X	}
X	DBUG_RETURN(root);
X}
X
XEXPR_NODE *expr_left_to_right_parse(parent_strength)
Xregister char parent_strength;
X{
X	register EXPR_NODE *root, *tmp;
X	register char strength; 
X
X	DBUG_ENTER("expr_left_to_right_parse");
X	root = primary_parse();
X	if(parent_strength < (strength = operator_strength[Token]))
X	{
X		while(strength == operator_strength[Token])
X		{
X			tmp = get_expr_node(Token);
X			tmp->left = root;
X			root = tmp;
X			getoken();
X			root->right = expr_left_to_right_parse(strength);
X		}
X	}
X	DBUG_RETURN(root);
X}
X
XEXPR_NODE *primary_parse()
X{
X	register EXPR_NODE *root = NULL, *end_pointer, *tmp;
X	register int lpar;
X
X	DBUG_ENTER("primary_parse");
X	switch ( Token )
X	{
X	case T_LPAREN:
X		/*
X		 * it's a parenthesized expression
X		 */
X		getoken();
X		root = expr_parse();
X		if ( Token!=T_RPAREN )
X			error( "missing ')'", ACT_ERROR );
X		getoken();
X		break;
X	case T_LNOT:
X	case T_NOT:
X	case T_INCR:
X	case T_DECR:
X	case T_DOLLAR:
X		root = get_expr_node(Token);
X		getoken();
X		root->left = primary_parse();
X		break;
X	case T_SUB:
X		root = get_expr_node((char) T_UMINUS);
X		getoken();
X		root->left = primary_parse();
X		break;
X	case T_MUL:
X		root = get_expr_node((char) T_STAR);
X		getoken();
X		root->left = primary_parse();
X		break;
X	case T_AND:
X		root = get_expr_node((char) T_ADDROF);
X		getoken();
X		root->left = primary_parse();
X		break;
X	case T_ADD:
X		getoken();
X		root = primary_parse();
X		break;
X	case T_CONSTANT:
X		root = get_expr_node(Token);
X		root->left = (EXPR_NODE *) getmemory(sizeof(DATUM));
X		((DATUM *) (root->left))->ival = Value.ival;
X		getoken();
X		break;
X	case T_FUNCTION:
X		root = get_expr_node(Token);
X		root->left = (EXPR_NODE *) getmemory(sizeof(DATUM));
X		((DATUM *) (root->left))->ival = Value.ival;
X		getoken();
X		if ( Token==T_LPAREN )
X		{
X			lpar = 1;
X			getoken();
X		}
X		else
X			lpar = 0;
X		/*
X		 * Parse arguments into a list of expressions.
X		 */
X		if ( Token!=T_RPAREN && Token!=T_EOF )
X		{
X			for ( ;; )
X			{
X				tmp = get_expr_node((char) T_FUNCTION);
X				if(!root->right) {
X					root->right = end_pointer = tmp;
X				} else {
X					end_pointer->right = tmp;
X					end_pointer = tmp;
X				}
X				end_pointer->left = expr_parse();
X				if((tmp = end_pointer->left) &&
X				   (tmp->operator == T_REGEXP))
X					tmp->operator = T_REGEXP_ARG;
X				if ( Token==T_COMMA )
X					getoken();
X				else
X					break;
X			}
X		}
X		if ( lpar )
X			if( Token!=T_RPAREN )
X				error( "missing ')'", ACT_ERROR );
X			else
X				getoken();
X		break;
X	case T_REGEXP:
X	case T_STRING:
X		root = get_expr_node(Token);
X		root->left = (EXPR_NODE *) getmemory(strlen(Value.dptr) + 1);
X		strcpy((char *) root->left, Value.dptr);
X		getoken();
X		break;
X	case T_NF:
X	case T_NR:
X	case T_FS:
X	case T_RS:
X	case T_FILENAME:
X	case T_BEGIN:
X	case T_END:
X		root = get_expr_node(Token);
X		getoken();
X		break;
X	case T_VARIABLE:
X		root = get_expr_node(Token);
X		root->left = (EXPR_NODE *) Value.dptr;
X		getoken();
X		break;
X	case T_EOF:
X		break;
X	default:
X		syntaxerror();
X	}
X	/*
X	 * a "[" means it's an array reference
X	 */
X	if ( Token==T_LBRACKET )
X	{
X		tmp = get_expr_node(Token);
X		tmp->left = root;
X		root = tmp;
X		getoken();
X		root->right = expr_parse();
X		if ( Token!=T_RBRACKET )
X			error( "missing ']'", ACT_ERROR );
X		getoken();
X	}
X
X	if ( Token==T_INCR || Token==T_DECR )
X	{
X		tmp = get_expr_node((char)
X				((Token==T_INCR) ? T_POSTINCR : T_POSTDECR));
X		tmp->left = root;
X		root = tmp;
X	}
X	DBUG_RETURN(root);
X}
X
Xvoid syntaxerror()
X{
X	DBUG_ENTER("syntaxerror");
X	error( "syntax error", ACT_ERROR );
X	DBUG_VOID_RETURN;
X}
SHAR_EOF
cat << \SHAR_EOF > bawksym.c
/*
 * Bawk C actions builtin functions, variable declaration, and
 * stack management routines.
 */
#include <stdio.h>
#include "bawk.h"

#define MAXARGS		10	/* max # of arguments to a builtin func */
#define F_PRINTF	1
#define F_GETLINE	2
#define F_STRLEN	3
#define F_STRCPY	4
#define F_STRCMP	5
#define F_TOUPPER	6
#define F_TOLOWER	7
#define F_MATCH		8
#define F_NEXTFILE	9

int isfunction( s )
register char *s;
{
	/*
	 * Compare the string "s" to a list of builtin functions
	 * and return its (non-zero) token number.
	 * Return zero if "s" is not a function.
	 */
	DBUG_ENTER("isfunction");
	switch(*s) {
	    case 'g':
		if ( !strcmp( s, "getline" ) )
			DBUG_RETURN(F_GETLINE);
		break;
	    case 'm':
		if ( !strcmp( s, "match" ) )
			DBUG_RETURN(F_MATCH);
		break;
	    case 'n':
		if ( !strcmp( s, "nextfile" ) )
			DBUG_RETURN(F_NEXTFILE);
		break;
	    case 'p':
		if ( !strcmp( s, "printf" ) )
			DBUG_RETURN(F_PRINTF);
		break;
	    case 's':
		if ( !strcmp( s, "strlen" ) )
			DBUG_RETURN(F_STRLEN);
		if ( !strcmp( s, "strcpy" ) )
			DBUG_RETURN(F_STRCPY);
		if ( !strcmp( s, "strcmp" ) )
			DBUG_RETURN(F_STRCMP);
		break;
	    case 't':
		if ( !strcmp( s, "toupper" ) )
			DBUG_RETURN(F_TOUPPER);
		if ( !strcmp( s, "tolower" ) )
			DBUG_RETURN(F_TOLOWER);
		break;
	    default:;
	}
	DBUG_RETURN(0);
}

int iskeyword( s )
register char *s;
{
	/*
	 * Compare the string "s" to a list of keywords and return its
	 * (non-zero) token number.  Return zero if "s" is not a keyword.
	 */
	DBUG_ENTER("iskeyword");
	switch(*s) {
	    case 'b':
		if ( !strcmp( s, "break" ) )
			DBUG_RETURN(T_BREAK);
		break;
	    case 'c':
		if ( !strcmp( s, "char" ) )
			DBUG_RETURN(T_CHAR);
		break;
	    case 'e':
		if ( !strcmp( s, "else" ) )
			DBUG_RETURN(T_ELSE);
		break;
	    case 'i':
		if ( !strcmp( s, "int" ) )
			DBUG_RETURN(T_INT);
		if ( !strcmp( s, "if" ) )
			DBUG_RETURN(T_IF);
		break;
	    case 'w':
		if ( !strcmp( s, "while" ) )
			DBUG_RETURN(T_WHILE);
		break;
	    case 'B':
		if ( !strcmp( s, "BEGIN" ) )
			DBUG_RETURN(T_BEGIN);
		break;
	    case 'E':
		if ( !strcmp( s, "END" ) )
			DBUG_RETURN(T_END);
		break;
	    case 'F':
		if ( !strcmp( s, "FS" ) )
			DBUG_RETURN(T_FS);
		if ( !strcmp( s, "FILENAME" ) )
			DBUG_RETURN(T_FILENAME);
		break;
	    case 'N':
		if ( !strcmp( s, "NF" ) )
			DBUG_RETURN(T_NF);
		if ( !strcmp( s, "NR" ) )
			DBUG_RETURN(T_NR);
		break;
	    case 'R':
		if ( !strcmp( s, "RS" ) )
			DBUG_RETURN(T_RS);
		break;
	    default:;
	}
	DBUG_RETURN(0);
}

void function( funcnum, arg_root )
register int funcnum;
register EXPR_NODE *arg_root;
{
	register int argc, args[ MAXARGS ];

	DBUG_ENTER("function");
	argc = 0;
	/*
	 * If there are any arguments, evaluate them and copy their values
	 * to a local array.
	 */
	for(; argc < MAXARGS && arg_root; arg_root = arg_root->right)
	{
		walk_tree(arg_root->left);
		args[ argc++ ] = popint();
	}
	switch ( funcnum )
	{
	case F_PRINTF:	/* just like the real printf() function */
		pushint( printf( (char *) args[0], args[1], args[2], args[3],
			 args[4], args[5], args[6], args[7], args[8],
			 args[9] ) );
		break;
	case F_GETLINE:
		/*
		 * Get the next line of input from the current input file
		 * and parse according to the current field seperator.
		 * Don't forget to free up the previous line's words first...
		 */
		while ( Fieldcount )
			free( Fields[ --Fieldcount ] );
		pushint( getline() );
		Fieldcount = parse( Linebuf, Fields, Fieldsep );
		break;
	case F_STRLEN:	/* calculate length of string argument */
		pushint( strlen( args[0] ) );
		break;
	case F_STRCPY:	/* copy second string argument to first string */
		pushint( strcpy( args[0], args[1] ) );
		break;
	case F_STRCMP:	/* compare two strings */
		pushint( strcmp( args[0], args[1] ) );
		break;
	case F_TOUPPER:	/* convert the character argument to upper case */
		pushint( toupper( args[0] ) );
		break;
	case F_TOLOWER:	/* convert the character argument to lower case */
		pushint( tolower( args[0] ) );
		break;
	case F_MATCH:	/* match a string argument to a regular expression */
		pushint( match( (char *) args[0], (char *) args[1] ) );
		break;
	case F_NEXTFILE:/* close current input file and process next file */
		endfile();
		pushint( 1 ); /* is this a correct value? jw */
		break;
	default:	/* oops! */
		error( "bad function call", ACT_ERROR );
	}
	DBUG_VOID_RETURN;
}

VARIABLE *
findvar( s )
register char *s;
{
	/*
	 * Search the symbol table for a variable whose name is "s".
	 */
	register VARIABLE *pvar;
	register int i;
	register char name[ MAXVARLEN ];

	DBUG_ENTER("findvar");
	i = 0;
	while ( i < MAXVARLEN && (isalnum( *s ) || (*s == '_')))
		name[i++] = *s++;
	if ( i<MAXVARLEN )
		name[i] = 0;

	for ( pvar = Vartab; pvar<Nextvar; ++pvar )
	{
		if ( !strncmp( pvar->vname, name, MAXVARLEN ) )
			DBUG_RETURN(pvar);
	}
	DBUG_RETURN(NULL);
}

VARIABLE *
addvar( name )
register char *name;
{
	/*
	 * Add a new variable to symbol table and assign it default
	 * attributes (int name;)
	 */
	register int i;

	DBUG_ENTER("addvar");
	if ( Nextvar <= Vartab + MAXVARTABSZ )
	{
		i = 0;
		while ( i<MAXVARLEN && (isalnum( *name ) || (*name == '_')))
			Nextvar->vname[i++] = *name++;
		if ( i<MAXVARLEN )
			Nextvar->vname[i] = 0;

		Nextvar->vclass = 0;
		Nextvar->vsize = WORD;
		Nextvar->vlen = 0;
		/*
		 * Allocate some new room
		 */
		Nextvar->vptr = get_clear_memory( WORD );
	}
	else
		error( "symbol table overflow", MEM_ERROR );

	DBUG_RETURN(Nextvar++);
}

EXPR_NODE *declist_parse()
{
	/*
	 * Parse a "char" or "int" statement.
	 */
	register char type;
	register EXPR_NODE *root, *end_pointer;

	DBUG_ENTER("declist_parse");
	type = Token;
	getoken();
	root = end_pointer = decl_parse( type );
	while ( Token==T_COMMA )
	{
		getoken();
		end_pointer->right = decl_parse( type );
		end_pointer = end_pointer->right;
	}
	if ( Token==T_SEMICOLON )
		getoken();
	DBUG_RETURN(root);
}

EXPR_NODE *decl_parse( type )
register int type;
{
	/*
	 * Parse an element of a "char" or "int" declaration list.
	 * The function stmt_compile() has already entered the variable
	 * into the symbol table as an integer, this routine simply changes
	 * the symbol's class, size or length according to the declaraction.
	 * WARNING: The interpreter depends on the fact that pointers are
	 * the same length as int's.  If your machine uses long's for
	 * pointers either change the code or #define int long (or whatever).
	 */
	register char class, size;
	register VARIABLE *pvar;
	register VARDECL *pdecl;
	register EXPR_NODE *node;
	EXPR_NODE *action;

	DBUG_ENTER("decl_parse");
	if ( Token==T_MUL )
	{
		/*
		 * it's a pointer
		 */
		getoken();
		node = decl_parse( type );
		if(node->operator == T_DECLARE)
			((VARDECL *) (node->right))->vclass += 1;
		else
			((VARDECL *) (node->right->right))->vclass += 1;
	}
	else if ( Token==T_VARIABLE )
	{
		/*
		 * Simple variable so far.  The token value (in the global
		 * "Value" variable) is a pointer to the variable's symbol
		 * table entry.
		 */
		pdecl = (VARDECL *) getmemory(sizeof(VARDECL));
		pvar = (VARIABLE *) Value.dptr;
		getoken();
		class = 0;
		/*
		 * Compute its length
		 */
		if ( Token==T_LBRACKET )
		{
			/*
			 * It's an array.
			 */
			node = get_expr_node((char) T_ARRAY_DECLARE);
			node->left = action =
				get_expr_node((char) T_ARRAY_DECLARE);
			action->left = (EXPR_NODE *) pdecl;
			getoken();
			++class;
			/*
			 * Parse the dimension expression
			 */
			action->right = expr_parse();
			if ( Token!=T_RBRACKET )
				error( "missing ']'", ACT_ERROR );
			getoken();
		}
		else
		{
			/*
			 * It's a simple variable.
			 */
			node = get_expr_node((char) T_DECLARE);
			node->left = (EXPR_NODE *) pdecl;
		}
		size = (type==T_CHAR) ? BYTE : WORD;
		pdecl->variable = pvar;
		pdecl->vclass = class;
		pdecl->vsize = size;
	}
	else
		syntaxerror();

	DBUG_RETURN(node);
}

void assignment()
{
	/*
	 * Perform an assignment
	 */
	int ival;

	DBUG_ENTER("assignment");
	ival = popint();
	/*
	 * make sure we've got an lvalue
	 */
	if ( Stackptr->lvalue )
	{
		if ( Stackptr->class )
			movmem((char *) &ival, Stackptr->value.dptr, WORD );
		else
			movmem((char *) &ival, Stackptr->value.dptr,
			       Stackptr->size);
		pop();
		pushint( ival );
	}
	else
		error( "'=' needs an lvalue", ACT_ERROR );
	DBUG_VOID_RETURN;
}

int pop()
{
	/*
	 * Pop the stack and return the integer value
	 */
	DBUG_ENTER("pop");
	if ( Stackptr >= Stackbtm )
		DBUG_RETURN((Stackptr--)->value.ival);
	DBUG_RETURN(error( "stack underflow", ACT_ERROR ));
}

void push( pclass, plvalue, psize, pdatum )
register char pclass, plvalue, psize;
register DATUM *pdatum;
{
	/*
	 * Push item parts onto the stack
	 */
	DBUG_ENTER("push");
	if ( ++Stackptr <= Stacktop )
	{
		Stackptr->lvalue = plvalue;
		Stackptr->size = psize;
		if ( !(Stackptr->class = pclass)  &&  !plvalue )
			Stackptr->value.ival = pdatum->ival;
		else
			Stackptr->value.dptr = pdatum->dptr;
	}
	else
		error( "stack overflow", MEM_ERROR );
	DBUG_VOID_RETURN;
}

void pushint( intvalue )
register int intvalue;
{
	/*
	 * push an integer onto the stack
	 */
	DBUG_ENTER("pushint");
	if ( ++Stackptr <= Stacktop )
	{
		Stackptr->lvalue =
		Stackptr->class = 0;
		Stackptr->size = WORD;
		Stackptr->value.ival = intvalue;
	}
	else
		error( "stack overflow", MEM_ERROR );
	DBUG_VOID_RETURN;
}

int popint()
{
	/*
	 * Resolve the item on the top of the stack and return it
	 */
	register int intvalue;

	DBUG_ENTER("popint");
	if ( Stackptr->lvalue )
	{
		/*
		 * if it's a byte indirect, sign extend it
		 */
		if ( Stackptr->size == BYTE && !Stackptr->class )
			intvalue = *Stackptr->value.dptr;
		else
		{
			/*
			 * otherwise, it's an unsigned int
			 */
			intvalue = (int) (*Stackptr->value.ptrptr);
		}
		pop();
		DBUG_RETURN(intvalue);
	}
	else
	{
		/*
		 * else it's an ACTUAL, just pop it
		 */
		DBUG_RETURN(pop());
	}
}

SHAR_EOF
cat << \SHAR_EOF > example1
@[()\[\]{}]@
{
	parens = parens + match( $0, @(@ );
	parens = parens - match( $0, @)@ );
	bracks = bracks + match( $0, @\[@ );
	bracks = bracks - match( $0, @]@ );
	braces = braces + match( $0, @{@ );
	braces = braces - match( $0, @}@ );
}
END
{
	printf("parens=%d, brackets=%d, braces=%d\n", parens, bracks, braces );
}
SHAR_EOF
cat << \SHAR_EOF > example3
BEGIN{strcpy RS,""}
{printf "%d %d %s\n",NR,NF,$0}
END{printf "total %d\n",NR}
SHAR_EOF
cat << \SHAR_EOF > link.cmd
FROM LIB:c.o+bawk.o+bawkact.o+bawkdo.o+bawkpat.o+bawksym.o+bawkparse.o
TO bawk
LIB LIB:lc.lib+LIB:amiga.lib
MAP nil:
SHAR_EOF
cat << \SHAR_EOF > tst2
@[(]@
{
	printf("parens=%d\n", parens );
	parens = parens + match( $0, @(@ );
	printf("parens=%d\n", parens );
}
END
{
	printf("parens=%d\n", parens );
}
SHAR_EOF