doc@s.cc.purdue.edu (Craig Norborg) (08/09/87)
The enclosed shar archive holds the first part of the sources for bawk. I got the original files from Fish disk 65. There are substantial changes. Don't hesitate to mail me if there are any problems. Johan Widen jw@sics.se # This is a shell archive. # Remove everything above and including the cut line. # Then run the rest of the file through sh. #----cut here-----cut here-----cut here-----cut here----# #!/bin/sh # Xshar: Extended Shell Archiver. # This is part 1 out of 2. # This archive created: Sat Aug 8 19:32:40 1987 # By: Craig Norborg (Purdue University Computing Center) # Run the following text with /bin/sh to create: # Makefile # README # bawk.doc # bawk.h # bawkparse.c # bawksym.c # example1 # example3 # link.cmd # tst2 cat << \SHAR_EOF > Makefile CC = lc CFLAGS = OBJ = bawk.o bawkact.o bawkdo.o bawkpat.o bawksym.o bawkparse.o .SUFFIXES : .SUFFIXES : .o .c .c.o : $(CC) $(CFLAGS) $*.c bawk : $(OBJ) blink with link.cmd $(OBJ) : bawk.h SHAR_EOF cat << \SHAR_EOF > README Changes as of 19-JUL-1987 I got this code from Fish disk number 65. Although the program is well written it did not run on the Amiga. I set out to get it running and have by now, as usual, spent far to much time on it. I release this update in the hope that someone else will do some work on it. I going to say a lot of negative things about Bawk here. This should not taken as critisism against the original author. As I said above the program was well written and it was fun to work with it. The characteristics of the original program is 'almost correct, easy to understand, but slow'. I much prefer to work with such code as compared to 'fast but buggy' code. Although you may not believe it at first, there is really a lot of functionality in this program. However; Bawk is not awk. There is a lot of functionality lacking, and Bawk is also slower. You can be the one to change all that... 8-) Here are the some differences between awk and Bawk. Regular expressions in Bawk are delimited by '@', not '/': @[Ff]oo@ The function 'print' is not implemented. You have to get by with 'printf'. The reason that print is not yet implemented is that automatic conversion between string values and other (e.g. numerical) values is not yet implemented. Assignment between arrays is not automatic. You have to use strcpy: Awk: $1 = "foo" Bawk: strcpy($1,"foo") Redirection (printf "%s", $0 >file) is not implemented. To match a field in awk you can say $1 ~ /[Ff]oo/ in Bawk you say match($1,@[Ff]oo@) Arrays in Bawk are not associative. It would probably be a good idea to remove the declarations from Bawk and to make type handling and array handling more like awk. Some minor changes: Bawk can now take a command line pattern. Here are three ways of invoking awk: $ bawk @[Ff]oo@ file $ echo >xxx @[Ff]oo@ $ bawk -f xxx file $ bawk - file @[Ff]oo@ $ The braindamaged parsing of command line arguments on the Amiga forced me to define an alternative string delimiter. Strings can now be delimited by '`' (`a string`) as well as '"' ("a string"). This behaviour is optional. If you do not like it then undefine QUOTE_STRING_HACK in bawk.h. I have used the Lattice C compiler. Conversion to manx with 32 bit ints should be easy. The code assumes that sizeof(int) == sizeof(char *). I used Fred Fish's dbug package to develop the current version. The name dbug is a bit of a misnomer, 'trace' is a more appropriate name. dbug implements tracing in an orderly manner. Very useful. The package is available on Fish disk 41. An older version is on disk 2. The trace code is conditionally compiled in depending on the definition DBUG_OFF To compile with tracing, comment out the definition of DBUG_OFF in bawk.h. You can then invoke tracing and dbug printing with bawk -\#t:d action file Select printing only with bawk -\#d action file Note that bawk will run slower and be a bit larger if you compile with tracing enabled. Johan Widen USENET: jw@sics.se SHAR_EOF cat << \SHAR_EOF > bawk.doc NAME bawk - text processor SYNOPSIS bawk rules [file] ... bawk -f rulefile file ... DESCRIPTION Bawk is a text processing program that searches files for specific patterns and performs "actions" for every occurrance of these patterns. The patterns can be "regular expressions" as used in the UNIX "ex" editor. The actions are expressed using a subset of the "C" language. By default Bawk will interpret the first argument as a rule. You can force bawk to take the rules from a file by using the -f option. The following arguments are taken to be the names of text files on which the rules are to be applied. The special file name "-" may also be used anywhere on the command line to take input from the standard input device. The command: bawk - prog.c - prog.h would read the patterns and actions rules from the standard input, then apply them to the files "prog.c", the standard input and "prog.h" in that order. The general format of a rules file is: <pattern> { <action> } <pattern> { <action> } ... There may be any number of these <pattern> { <action> } sequences in the rules file. Bawk reads a line of input from the current input file and applies every <pattern> { <action> } in sequence to the line. If the <pattern> corresponding to any { <action> } is missing, the action is applied to every line of input. The default { <action> } is to print the matched input line. PATTERNS The <pattern>'s may consist of any valid C expression. If the <pattern> consists of two expressions separated by a comma, it is taken to be a range and the <action> is performed on all lines of input that match the range. <pattern>'s may contain "regular expressions" delimited by an '@' symbol. Regular expressions can be thought of as a generalized "wildcard" string matching mechanism, similar to that used by many operating systems to specify file names. Regular expressions may contain any of the following characters: x An ordinary character (not mentioned below) matches that character. '\' The backslash quotes any character. "\$" matches a dollar-sign. '^' A circumflex at the beginning of an expression matches the beginning of a line. '$' A dollar-sign at the end of an expression matches the end of a line. '.' A period matches any single character except newline. ':x' A colon matches a class of characters described by the character following it: ':a' ":a" matches any alphabetic; ':d' ":d" matches digits; ':n' ":n" matches alphanumerics; ': ' ": " matches spaces, tabs, and other control characters, such as newline. '*' An expression followed by an asterisk matches zero or more occurrances of that expression: "fo*" matches "f", "fo", "foo", "fooo", etc. '+' An expression followed by a plus sign matches one or more occurrances of that expression: "fo+" matches "fo", "foo", "fooo", etc. '-' An expression followed by a minus sign optionally matches the expression. '[]' A string enclosed in square brackets matches any single character in that string, but no others. If the first character in the string is a circumflex, the expression matches any character except newline and the characters in the string. For example, "[xyz]" matches "xx" and "zyx", while "[^xyz]" matches "abc" but not "axb". A range of characters may be specified by two characters separated by "-". Note that, [a-z] matches alphabetics, while [z-a] never matches. For example, the following rules file would print every line that contained a valid C identifier: @[a-zA-Z][a-zA-Z0-9]@ And this rules file would print all lines between and including the ones that contained the word "START" and "END": @START@, @END@ ACTIONS Actions are expressed as a subset of the C language. All variables are global and default to int's if not formally declared. Variable declarations may appear anywhere within an action. Only char's and int's and pointers and arrays of char and int are allowed. Bawk allows only decimal integer constants to be used - no hex (0xnn) or octal (0nn). String and character constants may contain all of the special C escapes (\n, \r, etc.). Bawk supports the "if", "else", "while" and "break" flow of control constructs, which behave exactly as in C. Also supported are the following unary and binary operators, listed in order from highest to lowest precedence: operator type associativity () [] unary left to right ! ~ ++ -- - * & unary right to left * / % binary left to right + - binary left to right << >> binary left to right < <= > >= binary left to right == != binary left to right & binary left to right ^ binary left to right | binary left to right && binary left to right || binary left to right = binary right to left Comments are introduced by a '#' symbol and are terminated by the first newline character. The standard "/*" and "*/" comment delimiters are not supported and will result in a syntax error. FIELDS When bawk reads a line from the current input file, the record is automatically separated into "fields". A field is simply a string of consecutive characters delimited by either the beginning or end of line, or a "field separator" character Initially, the field separators are the space and tab character. The special unary operator '$' is used to reference one of the fields in the current input record (line). The fields are numbered sequentially starting at 1. The expression "$0" references the entire input line. Similarly, the "record separator" is used to determine the end of an input "line", initially the newline character. The field and record separators may be changed programatically by one of the actions and will remain in effect until changed again. If the record separator is empty then an empty line will be taken as record separator and tab, space and newline will be used as field separators. Fields behave exactly like strings; and can be used in the same context as a character array. These "arrays" can be considered to have been declared as: char ($n)[ 200 ]; In other words, they are 200 bytes long. Notice that the parentheses are necessary because the operators [] and $ associate from right to left; without them, the statement would have parsed as: char $(1[ 200 ]); which is obviously ridiculous. If the contents of one of these field arrays is altered, the "$0" field will reflect this change. For example, this expression: *$4 = 'A'; will change the first character of the fourth field to an upper- case letter 'A'. Then, when the following input line: 120 PRINT "Name address Zip" is processed, it would be printed as: 120 PRINT "Name Address Zip" Fields may also be modified with the strcpy() function (see below). For example, the expression: strcpy( $4, "Addr." ); applied to the same line above would yield: 120 PRINT "Name Addr. Zip" PREDEFINED VARIABLES The following variables are pre-defined: FS Field separator (see below). RS Record separator (see below also). NF Number of fields in current input record (line). NR Number of records processed thus far. FILENAME Name of current input file. BEGIN A special <pattern> that matches the beginning of input text, before the first record is read. END A special <pattern> that matches the end of input text, after the last record has been read. Bawk also provides some useful builtin functions for string manipulation and printing: printf(arg..) Exactly the printf() function from C. getline() Reads the next record from the current input file and returns 0 on end of file. nextfile() Closes out the current input file and begins processing the next file in the list (if any). strlen(s) Returns the length of its string argument. strcpy(s,t) Copies the string "t" to the string "s". strcmp(s,t) Compares the "s" to "t" and returns 0 if they match. toupper(c) Returns its character argument converted to upper-case. tolower(c) Returns its character argument converted to lower-case. match(s,@re@) Compares the string "s" to the regular expression "re" and returns the number of matches found (zero if none). EXAMPLES The following rules file will scan a C program, counting the number of mismatched parentheses, brackets, and braces. @[()\[\]{}]@ { parens = parens + match( $0, @(@ ); parens = parens - match( $0, @)@ ); bracks = bracks + match( $0, @\[@ ); bracks = bracks - match( $0, @]@ ); braces = braces + match( $0, @{@ ); braces = braces - match( $0, @}@ ); } END { printf("parens=%d, brackets=%d, braces=%d\n", parens, bracks, braces ); } This program will capitalize the first word in every sentence of a document: BEGIN { strcpy(RS,"."); # set record separator to a period } { if ( match( $1, @^[a-z]@ ) ) *$1 = toupper( *$1 ); printf( "%s\n", $0 ); } LIMITATIONS Bawk was originally written in BDS C, but every attempt was made to keep the code as portable as possible. The program should be compilable with any "standard" C compiler. On CP/M systems compiled with BDS C, bawk takes up about 24K. An input record may be no longer than 200 characters. If longer records are encountered, they terminate prematurely and the next record starts where the previous one was hacked off. A single pattern or action statement may be no longer than about 4K characters, excluding comments and whitespace. Since the program is semi-compiled the tokenized version will probably wind up being smaller than the source code, so the 4K figure is only approximate. AUTHOR Bob Brodt 486 Linden Ave. Bogota, NJ 07603 ACKNOWLEDGEMENTS The concept for bawk (and 3/4 of the name!) was taken from the program "awk" written by Afred V. Aho, Brian W. Kernighan and Peter J. Weinberger. My apologies for any irreverences. The regular expression compiler/parser was borrowed from a program called "grep" and has been highly modified. Grep is distributed by the DEC Users Society (DECUS) and is Copyright (C) 1980 by DECUS. The author acknowledges DECUS with a nod of thanks for giving their general permission and okey-dokey to copy or modify the grep program. UNIX is a trademark of AT&T Bell Labs. SHAR_EOF cat << \SHAR_EOF > bawk.h /* * Bawk constants and variable declarations. */ #include <stdlib.h> #include <ctype.h> /* #define ANSI_OFF */ #define DBUG_OFF #include <dbug.h> #ifdef BDS_C #define EXTERN /* */ #else #ifdef MAIN #define EXTERN /* */ #else #define EXTERN extern #endif #endif /* * If QUOTE_STRING_HACK is defined then Bawk programs passed on the * command line may delimit strings with either `grave accent` or * "double quotes". */ #define QUOTE_STRING_HACK /* * Table and buffer sizes */ #define MAXLINELEN 200 /* longest input line */ #define MAXWORDS (MAXLINELEN/2) /* max # of words in a line */ #define MAXWORKBUFLEN 4096 /* longest action or regular expression */ #define MAXVARTABSZ 50 /* max # of symbols */ #define MAXVARLEN 10 /* symbol name length */ #define MAXSTACKSZ 40 /* max value stack length (for expressions) */ /********************************************************** * Current Input File variables * **********************************************************/ /* * Current Input File pointer: */ #ifdef BDS_C EXTERN char *Fileptr, Curfbuf[ BUFSIZ ]; #else EXTERN FILE *Fileptr; #endif EXTERN char *Filename; /* current input file name */ EXTERN int Linecount; /* current input line number */ EXTERN int Recordcount; /* record count */ /* * Working buffers. */ EXTERN char Linebuf[ MAXLINELEN+1 ]; /* current input line buffer */ EXTERN char *Fields[ MAXWORDS+1 ]; /* pointers to the words in Linebuf */ EXTERN int Fieldcount; /* and the # of words */ EXTERN char Workbuf[ MAXWORKBUFLEN+1 ]; /* work area for C action and */ /* regular expression parsers */ /********************************************************** * Regular Expression Parser variables * **********************************************************/ /* * Tokens: */ #define CHAR 1 #define BOL 2 #define EOL 3 #define ANY 4 #define CLASS 5 #define NCLASS 6 #define STAR 7 #define PLUS 8 #define MINUS 9 #define ALPHA 10 #define DIGIT 11 #define NALPHA 12 #define PUNCT 13 #define RANGE 14 #define ENDPAT 15 /********************************************************** * C Actions Interpreter variables * **********************************************************/ /* * Tokens: */ #define T_STRING 16 /* primaries: */ #define T_DOLLAR 17 #define T_REGEXP 18 #define T_REGEXP_ARG 19 #define T_CONSTANT 20 #define T_VARIABLE 21 #define T_FUNCTION 22 #define T_SEMICOLON 23 /* punctuation */ #define T_EOF 24 #define T_LBRACE 25 #define T_RBRACE 26 #define T_LPAREN 27 #define T_RPAREN 28 #define T_LBRACKET 29 #define T_RBRACKET 30 #define T_COMMA 31 #define T_ASSIGN 32 /* operators: */ #define T_STAR 33 /* *foo */ #define T_MUL 34 #define T_DIV 35 #define T_MOD 36 #define T_ADD 37 #define T_UMINUS 38 /* -foo */ #define T_SUB 39 #define T_SHL 40 #define T_SHR 41 #define T_LT 42 #define T_LE 43 #define T_GT 44 #define T_GE 45 #define T_EQ 46 #define T_NE 47 #define T_NOT 48 #define T_ADDROF 49 /* &foo */ #define T_AND 50 #define T_XOR 51 #define T_OR 52 #define T_LNOT 53 #define T_LAND 54 #define T_LOR 55 #define T_INCR 56 #define T_DECR 57 #define T_POSTINCR 58 /* foo++ */ #define T_POSTDECR 59 /* foo-- */ #define T_IF 60 /* keywords: */ #define T_ELSE 61 #define T_WHILE 62 #define T_BREAK 63 #define T_CHAR 64 #define T_INT 65 #define T_BEGIN 66 #define T_END 67 #define T_NF 68 #define T_NR 69 #define T_FS 70 #define T_RS 71 #define T_FILENAME 72 #define T_STATEMENT 73 #define T_DECLARE 74 /* char foo */ #define T_ARRAY_DECLARE 75 /* char foo[5] */ #define MAX_TOKEN T_ARRAY_DECLARE #ifndef DBUG_OFF extern char *token_name[]; #endif #define PATTERN 'P' /* indicates C statement is within a pattern */ #define ACTION 'A' /* indicates C statement is within an action */ /* * Symbol table */ struct variable { char vname[ MAXVARLEN ]; char vclass; char vsize; int vlen; char *vptr; }; #define VARIABLE struct variable EXTERN VARIABLE Vartab[ MAXVARTABSZ ], *Nextvar; /* A variable may be redeclared. Is this a feature? Should we have block */ /* scopeing? vardecl stores the redeclaration info. */ struct vardecl { VARIABLE *variable; char vclass; char vsize; }; #define VARDECL struct vardecl /* * Symbol Table values */ #define ACTUAL 0 #define LVALUE 1 #define BYTE 1 #define WORD (sizeof(char *)) /* * Value stack */ union datum { int ival; char *dptr; char **ptrptr; }; #define DATUM union datum struct item { char class; char lvalue; char size; DATUM value; }; #define ITEM struct item EXTERN ITEM Stackbtm[ MAXSTACKSZ ], *Stackptr, *Stacktop; /* * parse tree */ struct expr_node { struct expr_node *left; struct expr_node *right; char operator; }; #define EXPR_NODE struct expr_node /* * Miscellaneous */ EXTERN char *Actptr; /* pointer into Workbuf during compilation */ EXTERN char Token; /* current input token */ EXTERN DATUM Value; /* and its value */ EXTERN char Saw_break; /* set when break stmt seen */ EXTERN char Where; /* indicates whether C stmt is a PATTERN or ACTION */ EXTERN char Fieldsep[128]; /* field seperator */ EXTERN char Recordsep[128]; /* record seperator */ EXTERN EXPR_NODE *Beginact; /* BEGINning of input actions */ EXTERN EXPR_NODE *Endact; /* END of input actions */ /********************************************************** * Rules structure * **********************************************************/ struct rule { struct { EXPR_NODE *start;/* C statements that match pattern start */ EXPR_NODE *stop;/* C statements that match pattern end */ char startseen; /* set if both a start and stop pattern */ /* given and if an input line matched the */ /* start pattern */ } pattern; EXPR_NODE *action; /* quasi-C statements parse tree */ struct rule *nextrule; /* pointer to next rule */ }; #define RULE struct rule EXTERN RULE *Rules, /* rule structures linked list head */ *Rulep; /* working pointer */ /********************************************************** * Miscellaneous * **********************************************************/ /* * Error exit values (returned to command shell) */ #define USAGE_ERROR 1 /* error in invokation */ #define FILE_ERROR 2 /* file not found errors */ #define RECORD_ERROR 3 /* input record too long */ #define RE_ERROR 4 /* bad regular expression */ #define ACT_ERROR 5 /* bad C action stmt */ #define MEM_ERROR 6 /* out of memory errors */ /* * Functions that return something special: */ #ifdef ANSI_OFF extern EXPR_NODE *act_compile(); extern VARIABLE *addvar(); extern void assignment(); extern char *cclass(); extern void compile(); extern EXPR_NODE *decl_parse(); extern EXPR_NODE *declist_parse(); extern void doaction(); extern int dopattern(); extern void endfile(); extern void error(); extern EXPR_NODE *expr_parse(), *expr_left_to_right_parse(); extern int fetchint(); extern char *fetchptr(); extern VARIABLE *findvar(); extern void function(); extern int getcharacter(); extern EXPR_NODE *get_expr_node(); extern int getline(); extern char *getmemory(); extern char *get_clear_memory(); extern char getoken(); extern void init_pop_array(); extern int instr(); extern int isfunction(); extern int iskeyword(); extern int match(); extern void newfile (); extern int parse(); extern EXPR_NODE *pat_compile(); extern char *pmatch(); extern int pop(); extern int popint(); extern void postincdec(); extern void preincdec(); extern EXPR_NODE *primary_parse(); extern void process(); extern void push(); extern void pushint(); extern int re_compile(); extern void stmt_lex(); extern EXPR_NODE *stmt_parse(); extern void storeint(); extern void storeptr(); extern char *str_compile(); extern void syntaxerror(); extern int ungetcharacter(); extern void unparse(); extern void usage(); extern void walk_tree(); #else ANSI_OFF extern EXPR_NODE *act_compile(char *); extern VARIABLE *addvar(char *); extern void assignment(void); extern char *cclass(char *); extern void compile(void); extern EXPR_NODE *decl_parse(int); extern EXPR_NODE *declist_parse(void); extern void doaction(EXPR_NODE *); extern int dopattern(EXPR_NODE *); extern void endfile(void); extern void error(char *,int); extern EXPR_NODE *expr_parse(void), *expr_left_to_right_parse(char); extern int fetchint(char *); extern char *fetchptr(char *); extern VARIABLE *findvar(char *); extern void function(int,EXPR_NODE *); extern int getcharacter(void); extern EXPR_NODE *get_expr_node(char); extern int getline(void); extern char *getmemory(unsigned); extern char *get_clear_memory(unsigned); extern char getoken(void); extern void init_pop_array(void); extern int instr(char,char *); extern int isfunction(char *); extern int iskeyword(char *); extern int match(char *,char *); extern void newfile (char *); extern int parse(char *,char **,char *); extern EXPR_NODE *pat_compile(char *); extern char *pmatch(char *,char *,char *); extern int pop(void); extern int popint(void); extern void postincdec(int); extern void preincdec(int); extern EXPR_NODE *primary_parse(void); extern void process(void); extern void push(char,char,char,DATUM *); extern void pushint(int); extern int re_compile(char *); extern void stmt_lex(char *); extern EXPR_NODE *stmt_parse(void); extern void storeint(char *,int); extern void storeptr(char *,char *); extern char *str_compile(char *,char); extern void syntaxerror(void); extern int ungetcharacter(char); extern void unparse(char **,int,char *,char *); extern void usage(void); extern void walk_tree(EXPR_NODE *); #endif ANSI_OFF SHAR_EOF cat << \SHAR_EOF > bawkparse.c X/* X * Bawk C actions parser X */ X#include <stdio.h> X#include "bawk.h" X Xstatic char operator_strength[] = { X0, /* 0 */ X0, /* CHAR */ X0, /* BOL */ X0, /* EOL */ X0, /* ANY */ X0, /* CLASS */ X0, /* NCLASS */ X0, /* STAR */ X0, /* PLUS */ X0, /* MINUS */ X0, /* ALPHA */ X0, /* DIGIT */ X0, /* NALPHA */ X0, /* PUNCT */ X0, /* RANGE */ X0, /* ENDPAT */ X0, /* T_STRING */ X0, /* T_DOLLAR */ X0, /* T_REGEXP */ X0, /* T_REGEXP_ARG */ X0, /* T_CONSTANT */ X0, /* T_VARIABLE */ X0, /* T_FUNCTION */ X0, /* T_SEMICOLON */ X0, /* T_EOF */ X0, /* T_LBRACE */ X0, /* T_RBRACE */ X0, /* T_LPAREN */ X0, /* T_RPAREN */ X0, /* T_LBRACKET */ X0, /* T_RBRACKET */ X0, /* T_COMMA */ X1, /* T_ASSIGN */ X0, /* T_STAR */ X11, /* T_MUL */ X11, /* T_DIV */ X11, /* T_MOD */ X10, /* T_ADD */ X0, /* T_UMINUS */ X10, /* T_SUB */ X9, /* T_SHL */ X9, /* T_SHR */ X8, /* T_LT */ X8, /* T_LE */ X8, /* T_GT */ X8, /* T_GE */ X7, /* T_EQ */ X7, /* T_NE */ X0, /* T_NOT */ X0, /* T_ADDROF */ X6, /* T_AND */ X5, /* T_XOR */ X4, /* T_OR */ X0, /* T_LNOT */ X3, /* T_LAND */ X2, /* T_LOR */ X0, /* T_INCR */ X0, /* T_DECR */ X0, /* T_POSTINCR */ X0, /* T_POSTDECR */ X0, /* T_IF */ X0, /* T_ELSE */ X0, /* T_WHILE */ X0, /* T_BREAK */ X0, /* T_CHAR */ X0, /* T_INT */ X0, /* T_BEGIN */ X0, /* T_END */ X0, /* T_NF */ X0, /* T_NR */ X0, /* T_FS */ X0, /* T_RS */ X0, /* T_FILENAME */ X0, /* T_STATEMENT */ X0, /* T_DECLARE */ X0 /* T_ARRAY_DECLARE */ X}; X XEXPR_NODE *stmt_parse() X{ X /* X * Parse a statement. X */ X register EXPR_NODE *root = NULL, *end_pointer, *tmp; X X DBUG_ENTER("stmt_parse"); X switch ( Token ) X { X case T_EOF: X break; X case T_CHAR: X case T_INT: X root = declist_parse(); X break; X case T_LBRACE: X /* X * parse a compound statement X */ X getoken(); X while ( Token != T_RBRACE ) X { X tmp = get_expr_node((char) T_STATEMENT); X if(!root) { X root = end_pointer = tmp; X } else { X end_pointer->right = tmp; X end_pointer = tmp; X } X end_pointer->left = stmt_parse(); X } X if ( Token==T_RBRACE ) X getoken(); X break; X case T_IF: X /* X * parse an "if-else" statement X */ X if ( getoken() != T_LPAREN ) X syntaxerror(); X getoken(); X root = get_expr_node((char) T_IF); X root->left = end_pointer = get_expr_node((char) T_IF); X end_pointer->left = expr_parse(); X if ( Token!=T_RPAREN ) X syntaxerror(); X getoken(); X end_pointer->right = stmt_parse(); X if ( Token==T_ELSE ) X { X getoken(); X root->right = stmt_parse(); X } X break; X case T_WHILE: X /* X * parse a "while" statement X */ X root = get_expr_node((char) T_WHILE); X if ( getoken() != T_LPAREN ) X syntaxerror(); X X getoken(); X root->left = expr_parse(); X if ( Token!=T_RPAREN ) X syntaxerror(); X X getoken(); X root->right = stmt_parse(); X break; X case T_BREAK: X /* X * parse a "break" statement X */ X root = get_expr_node((char) T_BREAK); X getoken(); X break; X case T_SEMICOLON: X break; X default: X root = expr_parse(); X } X X if ( Token==T_SEMICOLON ) X getoken(); X DBUG_RETURN(root); X} X XEXPR_NODE *expr_parse() X{ X register EXPR_NODE *root, *tmp; X register char strength; X X DBUG_ENTER("expr_parse"); X strength = operator_strength[T_ASSIGN]; X root = expr_left_to_right_parse(strength); X if(strength == operator_strength[Token]) X { X /* assignments are grouped right to left */ X tmp = get_expr_node(Token); X tmp->left = root; X root = tmp; X getoken(); X root->right = expr_parse(); X } X DBUG_RETURN(root); X} X XEXPR_NODE *expr_left_to_right_parse(parent_strength) Xregister char parent_strength; X{ X register EXPR_NODE *root, *tmp; X register char strength; X X DBUG_ENTER("expr_left_to_right_parse"); X root = primary_parse(); X if(parent_strength < (strength = operator_strength[Token])) X { X while(strength == operator_strength[Token]) X { X tmp = get_expr_node(Token); X tmp->left = root; X root = tmp; X getoken(); X root->right = expr_left_to_right_parse(strength); X } X } X DBUG_RETURN(root); X} X XEXPR_NODE *primary_parse() X{ X register EXPR_NODE *root = NULL, *end_pointer, *tmp; X register int lpar; X X DBUG_ENTER("primary_parse"); X switch ( Token ) X { X case T_LPAREN: X /* X * it's a parenthesized expression X */ X getoken(); X root = expr_parse(); X if ( Token!=T_RPAREN ) X error( "missing ')'", ACT_ERROR ); X getoken(); X break; X case T_LNOT: X case T_NOT: X case T_INCR: X case T_DECR: X case T_DOLLAR: X root = get_expr_node(Token); X getoken(); X root->left = primary_parse(); X break; X case T_SUB: X root = get_expr_node((char) T_UMINUS); X getoken(); X root->left = primary_parse(); X break; X case T_MUL: X root = get_expr_node((char) T_STAR); X getoken(); X root->left = primary_parse(); X break; X case T_AND: X root = get_expr_node((char) T_ADDROF); X getoken(); X root->left = primary_parse(); X break; X case T_ADD: X getoken(); X root = primary_parse(); X break; X case T_CONSTANT: X root = get_expr_node(Token); X root->left = (EXPR_NODE *) getmemory(sizeof(DATUM)); X ((DATUM *) (root->left))->ival = Value.ival; X getoken(); X break; X case T_FUNCTION: X root = get_expr_node(Token); X root->left = (EXPR_NODE *) getmemory(sizeof(DATUM)); X ((DATUM *) (root->left))->ival = Value.ival; X getoken(); X if ( Token==T_LPAREN ) X { X lpar = 1; X getoken(); X } X else X lpar = 0; X /* X * Parse arguments into a list of expressions. X */ X if ( Token!=T_RPAREN && Token!=T_EOF ) X { X for ( ;; ) X { X tmp = get_expr_node((char) T_FUNCTION); X if(!root->right) { X root->right = end_pointer = tmp; X } else { X end_pointer->right = tmp; X end_pointer = tmp; X } X end_pointer->left = expr_parse(); X if((tmp = end_pointer->left) && X (tmp->operator == T_REGEXP)) X tmp->operator = T_REGEXP_ARG; X if ( Token==T_COMMA ) X getoken(); X else X break; X } X } X if ( lpar ) X if( Token!=T_RPAREN ) X error( "missing ')'", ACT_ERROR ); X else X getoken(); X break; X case T_REGEXP: X case T_STRING: X root = get_expr_node(Token); X root->left = (EXPR_NODE *) getmemory(strlen(Value.dptr) + 1); X strcpy((char *) root->left, Value.dptr); X getoken(); X break; X case T_NF: X case T_NR: X case T_FS: X case T_RS: X case T_FILENAME: X case T_BEGIN: X case T_END: X root = get_expr_node(Token); X getoken(); X break; X case T_VARIABLE: X root = get_expr_node(Token); X root->left = (EXPR_NODE *) Value.dptr; X getoken(); X break; X case T_EOF: X break; X default: X syntaxerror(); X } X /* X * a "[" means it's an array reference X */ X if ( Token==T_LBRACKET ) X { X tmp = get_expr_node(Token); X tmp->left = root; X root = tmp; X getoken(); X root->right = expr_parse(); X if ( Token!=T_RBRACKET ) X error( "missing ']'", ACT_ERROR ); X getoken(); X } X X if ( Token==T_INCR || Token==T_DECR ) X { X tmp = get_expr_node((char) X ((Token==T_INCR) ? T_POSTINCR : T_POSTDECR)); X tmp->left = root; X root = tmp; X } X DBUG_RETURN(root); X} X Xvoid syntaxerror() X{ X DBUG_ENTER("syntaxerror"); X error( "syntax error", ACT_ERROR ); X DBUG_VOID_RETURN; X} SHAR_EOF cat << \SHAR_EOF > bawksym.c /* * Bawk C actions builtin functions, variable declaration, and * stack management routines. */ #include <stdio.h> #include "bawk.h" #define MAXARGS 10 /* max # of arguments to a builtin func */ #define F_PRINTF 1 #define F_GETLINE 2 #define F_STRLEN 3 #define F_STRCPY 4 #define F_STRCMP 5 #define F_TOUPPER 6 #define F_TOLOWER 7 #define F_MATCH 8 #define F_NEXTFILE 9 int isfunction( s ) register char *s; { /* * Compare the string "s" to a list of builtin functions * and return its (non-zero) token number. * Return zero if "s" is not a function. */ DBUG_ENTER("isfunction"); switch(*s) { case 'g': if ( !strcmp( s, "getline" ) ) DBUG_RETURN(F_GETLINE); break; case 'm': if ( !strcmp( s, "match" ) ) DBUG_RETURN(F_MATCH); break; case 'n': if ( !strcmp( s, "nextfile" ) ) DBUG_RETURN(F_NEXTFILE); break; case 'p': if ( !strcmp( s, "printf" ) ) DBUG_RETURN(F_PRINTF); break; case 's': if ( !strcmp( s, "strlen" ) ) DBUG_RETURN(F_STRLEN); if ( !strcmp( s, "strcpy" ) ) DBUG_RETURN(F_STRCPY); if ( !strcmp( s, "strcmp" ) ) DBUG_RETURN(F_STRCMP); break; case 't': if ( !strcmp( s, "toupper" ) ) DBUG_RETURN(F_TOUPPER); if ( !strcmp( s, "tolower" ) ) DBUG_RETURN(F_TOLOWER); break; default:; } DBUG_RETURN(0); } int iskeyword( s ) register char *s; { /* * Compare the string "s" to a list of keywords and return its * (non-zero) token number. Return zero if "s" is not a keyword. */ DBUG_ENTER("iskeyword"); switch(*s) { case 'b': if ( !strcmp( s, "break" ) ) DBUG_RETURN(T_BREAK); break; case 'c': if ( !strcmp( s, "char" ) ) DBUG_RETURN(T_CHAR); break; case 'e': if ( !strcmp( s, "else" ) ) DBUG_RETURN(T_ELSE); break; case 'i': if ( !strcmp( s, "int" ) ) DBUG_RETURN(T_INT); if ( !strcmp( s, "if" ) ) DBUG_RETURN(T_IF); break; case 'w': if ( !strcmp( s, "while" ) ) DBUG_RETURN(T_WHILE); break; case 'B': if ( !strcmp( s, "BEGIN" ) ) DBUG_RETURN(T_BEGIN); break; case 'E': if ( !strcmp( s, "END" ) ) DBUG_RETURN(T_END); break; case 'F': if ( !strcmp( s, "FS" ) ) DBUG_RETURN(T_FS); if ( !strcmp( s, "FILENAME" ) ) DBUG_RETURN(T_FILENAME); break; case 'N': if ( !strcmp( s, "NF" ) ) DBUG_RETURN(T_NF); if ( !strcmp( s, "NR" ) ) DBUG_RETURN(T_NR); break; case 'R': if ( !strcmp( s, "RS" ) ) DBUG_RETURN(T_RS); break; default:; } DBUG_RETURN(0); } void function( funcnum, arg_root ) register int funcnum; register EXPR_NODE *arg_root; { register int argc, args[ MAXARGS ]; DBUG_ENTER("function"); argc = 0; /* * If there are any arguments, evaluate them and copy their values * to a local array. */ for(; argc < MAXARGS && arg_root; arg_root = arg_root->right) { walk_tree(arg_root->left); args[ argc++ ] = popint(); } switch ( funcnum ) { case F_PRINTF: /* just like the real printf() function */ pushint( printf( (char *) args[0], args[1], args[2], args[3], args[4], args[5], args[6], args[7], args[8], args[9] ) ); break; case F_GETLINE: /* * Get the next line of input from the current input file * and parse according to the current field seperator. * Don't forget to free up the previous line's words first... */ while ( Fieldcount ) free( Fields[ --Fieldcount ] ); pushint( getline() ); Fieldcount = parse( Linebuf, Fields, Fieldsep ); break; case F_STRLEN: /* calculate length of string argument */ pushint( strlen( args[0] ) ); break; case F_STRCPY: /* copy second string argument to first string */ pushint( strcpy( args[0], args[1] ) ); break; case F_STRCMP: /* compare two strings */ pushint( strcmp( args[0], args[1] ) ); break; case F_TOUPPER: /* convert the character argument to upper case */ pushint( toupper( args[0] ) ); break; case F_TOLOWER: /* convert the character argument to lower case */ pushint( tolower( args[0] ) ); break; case F_MATCH: /* match a string argument to a regular expression */ pushint( match( (char *) args[0], (char *) args[1] ) ); break; case F_NEXTFILE:/* close current input file and process next file */ endfile(); pushint( 1 ); /* is this a correct value? jw */ break; default: /* oops! */ error( "bad function call", ACT_ERROR ); } DBUG_VOID_RETURN; } VARIABLE * findvar( s ) register char *s; { /* * Search the symbol table for a variable whose name is "s". */ register VARIABLE *pvar; register int i; register char name[ MAXVARLEN ]; DBUG_ENTER("findvar"); i = 0; while ( i < MAXVARLEN && (isalnum( *s ) || (*s == '_'))) name[i++] = *s++; if ( i<MAXVARLEN ) name[i] = 0; for ( pvar = Vartab; pvar<Nextvar; ++pvar ) { if ( !strncmp( pvar->vname, name, MAXVARLEN ) ) DBUG_RETURN(pvar); } DBUG_RETURN(NULL); } VARIABLE * addvar( name ) register char *name; { /* * Add a new variable to symbol table and assign it default * attributes (int name;) */ register int i; DBUG_ENTER("addvar"); if ( Nextvar <= Vartab + MAXVARTABSZ ) { i = 0; while ( i<MAXVARLEN && (isalnum( *name ) || (*name == '_'))) Nextvar->vname[i++] = *name++; if ( i<MAXVARLEN ) Nextvar->vname[i] = 0; Nextvar->vclass = 0; Nextvar->vsize = WORD; Nextvar->vlen = 0; /* * Allocate some new room */ Nextvar->vptr = get_clear_memory( WORD ); } else error( "symbol table overflow", MEM_ERROR ); DBUG_RETURN(Nextvar++); } EXPR_NODE *declist_parse() { /* * Parse a "char" or "int" statement. */ register char type; register EXPR_NODE *root, *end_pointer; DBUG_ENTER("declist_parse"); type = Token; getoken(); root = end_pointer = decl_parse( type ); while ( Token==T_COMMA ) { getoken(); end_pointer->right = decl_parse( type ); end_pointer = end_pointer->right; } if ( Token==T_SEMICOLON ) getoken(); DBUG_RETURN(root); } EXPR_NODE *decl_parse( type ) register int type; { /* * Parse an element of a "char" or "int" declaration list. * The function stmt_compile() has already entered the variable * into the symbol table as an integer, this routine simply changes * the symbol's class, size or length according to the declaraction. * WARNING: The interpreter depends on the fact that pointers are * the same length as int's. If your machine uses long's for * pointers either change the code or #define int long (or whatever). */ register char class, size; register VARIABLE *pvar; register VARDECL *pdecl; register EXPR_NODE *node; EXPR_NODE *action; DBUG_ENTER("decl_parse"); if ( Token==T_MUL ) { /* * it's a pointer */ getoken(); node = decl_parse( type ); if(node->operator == T_DECLARE) ((VARDECL *) (node->right))->vclass += 1; else ((VARDECL *) (node->right->right))->vclass += 1; } else if ( Token==T_VARIABLE ) { /* * Simple variable so far. The token value (in the global * "Value" variable) is a pointer to the variable's symbol * table entry. */ pdecl = (VARDECL *) getmemory(sizeof(VARDECL)); pvar = (VARIABLE *) Value.dptr; getoken(); class = 0; /* * Compute its length */ if ( Token==T_LBRACKET ) { /* * It's an array. */ node = get_expr_node((char) T_ARRAY_DECLARE); node->left = action = get_expr_node((char) T_ARRAY_DECLARE); action->left = (EXPR_NODE *) pdecl; getoken(); ++class; /* * Parse the dimension expression */ action->right = expr_parse(); if ( Token!=T_RBRACKET ) error( "missing ']'", ACT_ERROR ); getoken(); } else { /* * It's a simple variable. */ node = get_expr_node((char) T_DECLARE); node->left = (EXPR_NODE *) pdecl; } size = (type==T_CHAR) ? BYTE : WORD; pdecl->variable = pvar; pdecl->vclass = class; pdecl->vsize = size; } else syntaxerror(); DBUG_RETURN(node); } void assignment() { /* * Perform an assignment */ int ival; DBUG_ENTER("assignment"); ival = popint(); /* * make sure we've got an lvalue */ if ( Stackptr->lvalue ) { if ( Stackptr->class ) movmem((char *) &ival, Stackptr->value.dptr, WORD ); else movmem((char *) &ival, Stackptr->value.dptr, Stackptr->size); pop(); pushint( ival ); } else error( "'=' needs an lvalue", ACT_ERROR ); DBUG_VOID_RETURN; } int pop() { /* * Pop the stack and return the integer value */ DBUG_ENTER("pop"); if ( Stackptr >= Stackbtm ) DBUG_RETURN((Stackptr--)->value.ival); DBUG_RETURN(error( "stack underflow", ACT_ERROR )); } void push( pclass, plvalue, psize, pdatum ) register char pclass, plvalue, psize; register DATUM *pdatum; { /* * Push item parts onto the stack */ DBUG_ENTER("push"); if ( ++Stackptr <= Stacktop ) { Stackptr->lvalue = plvalue; Stackptr->size = psize; if ( !(Stackptr->class = pclass) && !plvalue ) Stackptr->value.ival = pdatum->ival; else Stackptr->value.dptr = pdatum->dptr; } else error( "stack overflow", MEM_ERROR ); DBUG_VOID_RETURN; } void pushint( intvalue ) register int intvalue; { /* * push an integer onto the stack */ DBUG_ENTER("pushint"); if ( ++Stackptr <= Stacktop ) { Stackptr->lvalue = Stackptr->class = 0; Stackptr->size = WORD; Stackptr->value.ival = intvalue; } else error( "stack overflow", MEM_ERROR ); DBUG_VOID_RETURN; } int popint() { /* * Resolve the item on the top of the stack and return it */ register int intvalue; DBUG_ENTER("popint"); if ( Stackptr->lvalue ) { /* * if it's a byte indirect, sign extend it */ if ( Stackptr->size == BYTE && !Stackptr->class ) intvalue = *Stackptr->value.dptr; else { /* * otherwise, it's an unsigned int */ intvalue = (int) (*Stackptr->value.ptrptr); } pop(); DBUG_RETURN(intvalue); } else { /* * else it's an ACTUAL, just pop it */ DBUG_RETURN(pop()); } } SHAR_EOF cat << \SHAR_EOF > example1 @[()\[\]{}]@ { parens = parens + match( $0, @(@ ); parens = parens - match( $0, @)@ ); bracks = bracks + match( $0, @\[@ ); bracks = bracks - match( $0, @]@ ); braces = braces + match( $0, @{@ ); braces = braces - match( $0, @}@ ); } END { printf("parens=%d, brackets=%d, braces=%d\n", parens, bracks, braces ); } SHAR_EOF cat << \SHAR_EOF > example3 BEGIN{strcpy RS,""} {printf "%d %d %s\n",NR,NF,$0} END{printf "total %d\n",NR} SHAR_EOF cat << \SHAR_EOF > link.cmd FROM LIB:c.o+bawk.o+bawkact.o+bawkdo.o+bawkpat.o+bawksym.o+bawkparse.o TO bawk LIB LIB:lc.lib+LIB:amiga.lib MAP nil: SHAR_EOF cat << \SHAR_EOF > tst2 @[(]@ { printf("parens=%d\n", parens ); parens = parens + match( $0, @(@ ); printf("parens=%d\n", parens ); } END { printf("parens=%d\n", parens ); } SHAR_EOF