parag@hpsdeb.sde.hp.com (Parag Patel) (05/19/91)
Submitted-by: Parag Patel <parag@hpsdeb.sde.hp.com> Posting-number: Volume 19, Issue 88 Archive-name: wacco/part01 This is version 1.1 of Wacco, basically an LL(1) parser generator. Wacco generates recursive-descent C++ code from an input file. The wacco file.w looks a lot like a yacc(1) input file, but with a lot more syntactic sugar added. Since the parser generated recurses, you can do attribute-driven parsing easily and even pass information into rules which could alter the parse. Wacco should port and run easily on most C++ systems. It does need C++ 2.0 of some flavor. It's been successfully built on HP-UX s300 and s800 systems, Sparc, and 4.3BSD running on HP hardware. The code is somewhat commented. Feel free to hack away, add new features, or fix my screwups. If you make mods you feel are useful, or fix some bug, please send me the cdiffs so I can make them available to others too. Parag Patel <parag@sde.hp.com ---- Cut Here and feed the following to sh ---- #!/bin/sh # This is wacco, a shell archive (produced by shar 3.49) # To extract the files from this archive, save it to a file, remove # everything above the "!/bin/sh" line above, and type "sh file_name". # # made 05/18/1991 03:21 UTC by parag@hpsdeb # Source directory /users/parag/tools/wacco # # existing files will NOT be overwritten unless -c is specified # # This shar contains: # length mode name # ------ ---------- ------------------------------------------ # 2899 -r--r--r-- README # 1967 -r--r--r-- Makefile # 4588 -r--r--r-- wacco.1 # 15987 -r--r--r-- wacco.doc # 42897 -r--r--r-- wacco.doc.iw # 98569 -r-xr-xr-x wacco.doc.ps # 5585 -r--r--r-- wacco.w # 3575 -r--r--r-- defs.h # 1555 -r--r--r-- toks.h # 188 -r--r--r-- boolean.h # 2537 -r--r--r-- bitset.h # 1624 -r--r--r-- darray.h # 8290 -r--r--r-- table.h # 6447 -r--r--r-- bitset.C # 1739 -r--r--r-- tgram.w # 229 -r--r--r-- tgram.good # 261 -r--r--r-- tgram.bad # 6362 -r--r--r-- main.C # 3130 -r--r--r-- sym.C # 18403 -r--r--r-- parse.C # 3884 -r--r--r-- scan.C # 5614 -r--r--r-- build.C # 2302 -r--r--r-- check.C # 3603 -r--r--r-- read.C # 18803 -r--r--r-- gen.C # 3832 -r--r--r-- io.C # 120 -r--r--r-- version.C # # ============= README ============== if test -f 'README' -a X"$1" != X"-c"; then echo 'x - skipping README (File already exists)' else echo 'x - extracting README (Text)' sed 's/^X//' << 'SHAR_EOF' > 'README' && $Header: README,v 1.7 91/05/17 16:29:53 hmgr Exp $ X Copyright (c) 1991 by Parag Patel. All Rights Reserved. You can do what you wish with this as long as X (1) you do not claim it or any part of it as yours and X (2) you do not remove or alter my copyright in any file. This software is provided "AS IS" without any implied or express warranty as to its performance or to the results that may be obtained by using this software. It is completely unsupported. You're on your own. X X This is version 1.1 of Wacco, basically an LL(1) parser generator. Why Another Compiler COmpiler? Why not?!? X Wacco generates recursive-descent C++ code from an input file. The wacco file.w looks a lot like a yacc(1) input file, but with a lot more syntactic sugar added. Since it the parser generated recurses, you can do attribute-driven parsing easily and even pass information into rules which could alter the parse. X I wrote wacco to give me a platform for experiment with various error recovery schemes. A fairly cheesy first/follow set scheme is currently implemented. Wacco turned out to be useful in its own right and I never did get around to serious experimenting. X Wacco is written in itself. The file "wacco.w" describes its own format and was used to manually generate "parse.C" and "toks.h". (The original bootstrap version no longer exists. Wacco has evolved considerably from a much simpler version to the current implementation, so the old code would be useless anyway.) The files "parse.C" and "toks.h" are always shipped since there's no other way to build a working wacco. X The file "wacco.doc" describes the wacco file format in a tty-readable form. "Wacco.doc.iw" is the much prettier IslandWrite version of the document. "Wacco.doc.ps" is the Postscript output from IslandWrite. Wacco.1 describes only the command-line options. X There are few comments and lots of ugly non-OO code throughout wacco. It had evolved from a straight C implementation and I never got around to cleaning it up. Sorry. X Wacco should port and run easily on most C++ systems. It does need C++ 2.0 of some flavor. It's been successfully built on HP-UX s300 and s800 systems, Sparc, and 4.3BSD running on HP hardware. You may need to tweak some -D defines in the Makefile. If sizeof(long) is NOT 32 bits, you may have to perform major surgery on bitset.h and bitset.C. X All you should need to do is modify CFLAGS in the Makefile, then type "make". The Makefile should come setup for HP-UX systems. Type "make tst" to build a simple test program using "tgram.w". The files to be installed wherever you prefer are "wacco" and "wacco.1". X The code is somewhat commented. Feel free to hack away, add new features, or fix my screwups. If you make mods you feel are useful, or fix some bug, please send me the cdiffs so I can make them available to others too. X X X X -- Parag Patel <parag@sde.hp.com> SHAR_EOF chmod 0444 README || echo 'restore of README failed' Wc_c="`wc -c < 'README'`" test 2899 -eq "$Wc_c" || echo 'README: original size 2899, current size' "$Wc_c" fi # ============= Makefile ============== if test -f 'Makefile' -a X"$1" != X"-c"; then echo 'x - skipping Makefile (File already exists)' else echo 'x - extracting Makefile (Text)' sed 's/^X//' << 'SHAR_EOF' > 'Makefile' && # Copyright (c) 1991 by Parag Patel. All Rights Reserved. # $Header: Makefile,v 1.27 91/05/17 16:29:50 hmgr Exp $ X CXX = CC .SUFFIXES: .C .C.o: X $(CXX) $(CFLAGS) -c $< X # system-dependent options - use any appropriate -D<sys> macros # -DBSD for a BSD derivative (Sun) # -Dpid_t=long if your headers don't define a pid_t type # -DFREE_TAKES_CHAR if you have a free(char*) instead of free(void*) (Sun) CFLAGS = -g LIBS = libwacco.a X SRCS = README Makefile wacco.1 wacco.doc wacco.doc.iw wacco.doc.ps wacco.w \ X defs.h toks.h boolean.h bitset.h darray.h table.h \ X bitset.C tgram.w tgram.good tgram.bad \ X main.C sym.C parse.C scan.C build.C check.C read.C gen.C \ X io.C version.C X OBJS = main.o sym.o parse.o scan.o build.o check.o read.o gen.o bitset.o X wacco : $(OBJS) libwacco.a X $(CXX) $(CFLAGS) -o wacco $(OBJS) $(LIBS) X libwacco.a : io.o version.o X ar ru libwacco.a $(?) X -[ -x /usr/bin/ranlib ] && ranlib libwacco.a X tst: parser.o scanner.o X $(CXX) $(CFLAGS) -o tst parser.o scanner.o $(LIBS) $(LFLAGS) -ll X -./tst <tgram.bad X ./tst <tgram.good X parser.C scanner.l: tgram.w wacco X ./wacco tgram.w X tar: $(SRCS) X tar -cvf - $(SRCS) | compress >wacco.tar.Z X shar: $(SRCS) X shar -ac -nwacco -l50 -owacco-shar $(SRCS) X clean: X rm -f wacco *.o libwacco.a wacco.tar.Z* wacch.shar* tst parser.C scanner.l X files: X @echo $(SRCS) X main.o : main.C toks.h defs.h boolean.h darray.h table.h bitset.h sym.o : sym.C defs.h boolean.h darray.h table.h bitset.h parse.o : parse.C toks.h defs.h boolean.h darray.h table.h bitset.h scan.o : scan.C toks.h defs.h boolean.h darray.h table.h bitset.h build.o : build.C defs.h boolean.h darray.h table.h bitset.h check.o : check.C defs.h boolean.h darray.h table.h bitset.h read.o : read.C toks.h defs.h boolean.h darray.h table.h bitset.h gen.o : gen.C toks.h defs.h boolean.h darray.h table.h bitset.h io.o : io.C toks.h defs.h boolean.h darray.h table.h bitset.h version.o : version.C bitset.o : bitset.C bitset.h boolean.h SHAR_EOF chmod 0444 Makefile || echo 'restore of Makefile failed' Wc_c="`wc -c < 'Makefile'`" test 1967 -eq "$Wc_c" || echo 'Makefile: original size 1967, current size' "$Wc_c" fi # ============= wacco.1 ============== if test -f 'wacco.1' -a X"$1" != X"-c"; then echo 'x - skipping wacco.1 (File already exists)' else echo 'x - extracting wacco.1 (Text)' sed 's/^X//' << 'SHAR_EOF' > 'wacco.1' && .\" Copyright (c) 1991 by Parag Patel. All Rights Reserved. .\" $Header: wacco.1,v 1.13 91/02/22 16:04:11 hmgr Exp $ .TH WACCO 1 unsupported .ad b .SH NAME wacco \- why another compiler-compiler? .SH SYNOPSIS .B wacco .RB [ -dciOCL ] .RB [ -h header] .RB [ -p parser] .RB [ -s scanner] [file] .SH DESCRIPTION .I Wacco is another compiler-compiler. (Why another compiler-compiler you may ask? Why not!) It has some rather convenient features with a lot of syntactic sugar tossed on top over what .IR yacc (1) provides. .PP Unlike .IR yacc (1), .I wacco generates a top-down recursive-descent LL(1) parser instead of a bottom-up LALR parser. Although .I wacco generated parsers handle a smaller class of grammars than .IR yacc (1), in practice, there is rarely any need for a full LALR parser. It is much easier to deal with error recovery in a top-down parser. It is also possible to re-direct and even completely alter the parse on the fly, as well as perform attribute-driven parsing. .PP .I Wacco generates a parser that automatically attempts to resync on errors based on some heuristics on the first and follow sets of non-terminals. Admittedly this is a far from optimal error-handling system, but it is much better that what .IR yacc (1) provides (skip X tokens, then continue!). Future versions of .I wacco may provide much more intelligent error-recovery systems. .PP .I Wacco also allows using its parser in an attribute-driven manner. Information may be passed down to the right-hand side of an expression even though that expression hasn't yet been parsed. Different rules may have different types associated with them. The C++ compiler will perform the type-checking for you! No more funny unions and hoping that you didn't make a mistake! .PP Token values do not have to be explicitly defined. String and character tokens may be specified implicitly as well, rather than creating a dummy symbol for them. .I Wacco will generate a header file containing definitions for all the tokens. .PP There is support for a somewhat smarter scanner. Errors will be (hopefully) printed out in a clear and simple manner. .PP .I Wacco currently generates only C++ code. Some day it may optionally generate straight C code as well (but don't hold your breath). .PP The grammar format is described in the .I wacco documentation since it is too lengthy to repeat here. See the .I wacco.doc files for more information. .SS Options .I Wacco expects a grammar on stdin if .I file is not specified on the command line. It will generate the files .I parser.C and .I tokens.h by default. If there is a scanner section in the input file, then the file .I scanner.l will also be generated. .TP .B -d Dump mode. Only prints (somewhat) interesting information about what .I wacco thinks the grammar looks like, first and follow sets, and other miscellaneous stuff. .TP .B -i Do not generate code for scanning case-insensitive strings. If the .B "string" construct is used in the grammer source, .I wacco will normally generate code like .BR [Ss][Tr][Ii][Nn][Gg] . This option inhibits such behavior to allow exact matches. .TP .B -c Normally, .I wacco will generate temporary output files and then compare them with the originals. The originals are replaced only if the new files are different. This is very handy for use inside makefiles, where doing things like this gets ugly. This option turns off this feature and always generates the output files. .TP .B -O Turns off optimization. Normally, .I wacco expands non-terminals that are only used once in the code rather than creating functions for them. If you use the "return" operator, you must use this option for now. If you use the "$?" construct, optimization will be automatically turned off. .TP .B -C Do not output the imbedded user code within the grammer. This generates a parser that either accepts or rejects its input, only printing errors. It is handy for verifying a grammer. .TP .B -L Do not generate the "#line" entries for the original .I wacco source file within in the parser. .TP .BI "-h " header Create a file named .I header instead of the default "tokens.h". .TP .BI "-p " parser Create a file named .I parser instead of the default "parser.C". .TP .BI "-s " scanner Create a file named .I scanner instead of the default "scanner.l". .SH FILES wacco.doc wacco.doc.iw wacco.doc.ps .br tokens.h scanner.l parser.C ./.wacco.tmp .SH NOTES The scanner generated may be ``compiled'' by either .IR lex (1) or .IR flex (1), although .I flex is highly recommended. .SH AUTHOR Copyright (c) 1991 by Parag Patel. All Rights Reserved. SHAR_EOF chmod 0444 wacco.1 || echo 'restore of wacco.1 failed' Wc_c="`wc -c < 'wacco.1'`" test 4588 -eq "$Wc_c" || echo 'wacco.1: original size 4588, current size' "$Wc_c" fi # ============= wacco.doc ============== if test -f 'wacco.doc' -a X"$1" != X"-c"; then echo 'x - skipping wacco.doc (File already exists)' else echo 'x - extracting wacco.doc (Text)' sed 's/^X//' << 'SHAR_EOF' > 'wacco.doc' && Copyright (c) 1991 by Parag Patel. All Rights Reserved. << $Header: wacco.doc,v 1.25 91/02/22 16:04:23 hmgr Exp $ >> X << Please see the wacco(1) man page for details on its usage. >> << Only the grammar format is described here. >> X X The underlying philosophy in wacco is that the code generated should be exactly like that someone would generate by hand, if they were writing a recursive-descent compiler manually. X X X The basic grammar file format is: X X /* C style comments */ X %opt <directives> X { <header> } X <rules> // C++ style comments X $$ X <scanner> X Wacco directives may be placed on the optional "%opt" line at the top of the source grammer. Only one such line is allowed in the grammer source, and it MUST be first in the source. The directives are actually the command-line options for wacco! Options may thus be set either on the command line, or in the wacco source itself. The entire "%opt" line is parsed as if it were the command line. Please see the man page for descriptions of the command-line options. X The header section (which is optional) is a set of code in curly-braces {} that is put at the top of the output parser.C file. This a the place to include files, define classes, or setup global variables. Naturally, there are no curlies {} if there is no need for a header section. X The scanner section (the two "$$" and everything after) is entirely optional. It is included in the grammar file to make it easy to refer to the actual values of tokens without explicitly defining those values. X Without any of the optional parts, a grammer consists only of rules. X X X The rules look much like those of yacc at first glance but there are some interesting differences. A rule looks like: X X ID <TYPE> : stuff ; X The ID on the left-hand side is a non-terminal and so is eventually turned into a function. The TYPE is the type that this function will accept in and return as a reference argument. It is optional and must be in angle-brackets <> if present and assumed to be "int" if not. It can be used to pass information into a function (rule) or to get information out of it. X X ID : stuff ; X ID <TYPE> : stuff ; X A vertical-bar "|" may be used to avoid duplicating the left-hand side: X X ID : stuff1 ; X ID : stuff2 ; X is equivalent to X X ID : stuff1 | stuff2 ; X X << For the rest of this document, the conventions are that terminals will be X in uppercase and non-terminals in lowercase. >> X X The "stuff" on the right-hand side can get kind of interesting. Like, yacc, this is basically a list of terminals or non-terminals that are expected in sequence. X X parenexpr : LPAREN expr RPAREN ; X X Terminals can be described in several different ways. X Simple character tokens are straight-forward. Their token value is always that of the character they represent. The null character '\0' may not be used as a token - its value used for other things internally. X X parenexpr : '(' expr ')' ; X For more complicated strings, just use the strings themselves! X X parenexpr : "<<" expr ">>" ; X The same string may be used in other rules to refer to that token. X Also, any identifier name may be used to define a terminal. If that id does not appear on the left side of a colon `:', then it is assumed to be a terminal symbol in the grammar. X Token codes for terminals are automatically assigned and stored in the "tokens.h" header file. The token value of a string is pretty much inaccessible. A character constant will be its own token. Any other terminal name like LPAREN above will be in the header file as an enum with the same name. X X X Actions (code) is imbedded anywhere on the right-hand side within pairs of curly-braces {}. X X parenexpr: '(' expr { $$ = $expr; } ')' X This introduces some other features that wacco has which yacc doesn't. First though, the value that the non-terminal returns is always "$$". X The values of the right-hand side are referred to directly via their symbolic names. Thus we use "$expr" instead of "$2" in yacc! Also, "expr"s must return "int"s or the C++ compiler will complain! X Wacco generates an appropriate temporary variable if and only if it is used by referring to a "$$" inside some code for that rule. Thus parenexpr above will have an in/out argument defined for it. If there were no code in {}, then parenexpr wouldn't be passed anything at all. X X Actually, The TYPE specifier of a non-terminal may actually be a lot more complicated than just a simple type: X X example <double d; int i, j> : ... { $$.d = 0.0; $$.i = 34; } X In this case, wacco creates a struct for this non-terminal instead of a simple variable. The contents of the <> are put into this struct. This allows passing more info in and out of a non-term without having to create a dummy struct by hand. It is also passed to the non-terminal function by reference rather than copying, and thus is very efficient. X X expr <int left, right> : ... ; X example : expr ';' { $$ = $expr.left + $expr.right } ; X Note that all exported non-terms MUST have simple types, to avoid bogus structure naming conventions. If you must have a complicated type returned from a start-symbol, you should create a specially named struct or class and use it instead. X Also, simple types must not be named. The following is illegal as well as redundant, and kind of silly anyway: X X expr <int var> : ... ; X X X If we have 2 "expr"s on the right, things get a little messier: X X example: '(' expr ',' expr ')' X { $$ = $expr1 + $expr2; }; or X example: '(' expr=front ',' expr=back ')' X { $$ = $front + $back; }; X The second form introduces the ability to name (alias) one of the right-hand side's non-terminal names! Here we name "expr1" to be called "front" and "expr2" to be "back" for just this particular right-hand side. X X X Since wacco generates a C++ recursive-descent parser, we can do even more interesting things on the right. Wacco passes the local vars to store return values by reference. Thus we can pass information into a rule as well as get stuff out of it. X X example: { $expr = $$; } '(' expr ')' { $$ = $expr; }; X This initializes the temp-var used to store the return value from "expr" to whatever was passed in to "example", then passes it to "expr". If a non-terminal never uses "$$", then it is assumed to not return anything, and no temp-var will be declared nor passed into it. X Other things that one can do: X X example: '(' { int v = 2; } expr ')' { v = $expr; }; X and create temp vars anywhere you want. Wacco carefully avoids putting out unnecessary sets of blocks in the output parser file. X To generate incomplete blocks, and allow a wierd sort of free-form grammar, the %{%} format may be used wherever a {} is normally used. This allows creating incomplete blocks like so: X X example: '(' %{ if (somevar) { %} expr ')' %{ } %} ; X Curly-braces are not counted within %{%} blocks, and %{%} blocks may be used wherever {} blocks are allowed. X X The empty rule may not be implicitly specified is in yacc, but must be defined with the special "[]" symbol: X X null: [] ; X expr: '(' expr ')' | [] ; X An empty statement is an error in wacco to help protect against typos and other mistakes. X X X Right-hand sides may have parentheses for grouping. Basically, a function must be generated for every parenthesized expression to maintain the parsing semantics: X X value: (ID | INT) | []; X is the equivalent of: X X value: v1 | []; X v1: ID | INT; X Just like every other non-terminal, parenthesized expressions have return values, types, aliases, and may be referred to in other parts of the right-hand side. The default type is the type of the enclosing parens or left-hand side for the outer-most parens: X X example (<long> ID | INT) { $$ = $_; }; X Multiple sets of parens on the right may be refered to as "$_1", "$_2", and so on. They may be named as well: X X example<float>: (ID | FLOAT)=num { $$ = $num; }; X Here the parens inherit the type "float" from "example". X Since the left-hand side may be used on the right for recursive functions, so may parenthesized expressions. The names just get a little strange. X X strange: (ID (OP # #1 #2 #3 #* | []) | []); X The inner "#" refers to the inner-most set of parens enclosing the "OP...". The strings "#" and "#1" are equivalent and refer to this inner most set of parens. "#2" refers to the next outer parens starting the "ID...". "#3" and "#*" refer the the name of the left-hand side, just for completeness. These can be viewed as the outermost "parens" in the expression. Ugly but sometimes necessary. X X X Other things defined in "tokens.h" include the end-of-input token EOI which has value 0, and the constants RETOK and RETERR, for appropriate return values. These have the values of TRUE (1) and FALSE (0) respectively. These may be used in the right-hand side of rules if it is determined that further parsing of rules is un-necessary. X X parenexpr: LPAREN expr { if ($expr == BOGUS) return RETERR; } RPAREN; X The return-code from various rules is always available as the magic string "$?" directly after that particular rule is called: X X parenexpr: LPAREN expr { if ($? != RETOK) return RETERR; } RPAREN; X The return code is overwritten with each call to a non-terminal on the right-hand side, so if a previous return value is needed, you must save it in some variable yourself. X The generated parser code does not look at the actual return value of non-terminals (funtions), so other return values may be used if desired. X X X By default, the first rule in the grammar is considered to be the start symbol. Instead of calling "yyparse()" to initiate the parse, the function to call is the name of the left-hand ID in the first rule. It is called with no arguments. It returns either RETOK or RETERR depending on whether the parse succeeded or not. X X firstsymbol: . . . ; X . . . X X main() X { X if (firstsymbol() == RETOK) X return OK; X return ERR; X } X But you don't have to have just one entry point! Adding a "%export" modifier after a non-terminal just before the ':' causes that symbol to become callable from outside the grammer: X X thing<mytype> %export : . . . ; X X func() { mytype var; return thing(var); } X The first non-terminal in the grammer is automatically exported unless "%export" is used somewhere in the grammer. Also, notice that if a "type" is defined and used for a non-terminal, that type must be passed in by reference to that function. X The "%export" feature lets you call several non-terminals in the grammer. This can be used to export parts of a grammer, say sub-expression parsing, or let you put several different parsers into one grammer file. All exported non-terminals are also listed as "extern"s in the "tokens.h" header file. X X X The scanner section is optional. If there is a "$$" at the end of the file, the rest is considered to be almost straight lex(1) source. If there is a "$$", every terminal must have a lex value associated with it. Character and string constants are self-defining. Other nonterminals are described in the lex section. X An example: X X expr: LPAREN expr RPAREN | "id" | []; X X $$ X X %% X X "." { return (int)EOI; } X X $LPAREN "("|"[" X $RPAREN ")"|"]" X X [ \t\v\n\f] ; X . { w_scanerr("Illegal character %d (%c)", yytext[0], yytext[0]); } X The string "id" naturally stands for itself. LPAREN and RPAREN are described in the lex section in a reverse order than normal. Wacco will convert those lines starting with a `$' into the appropriate lex output. This is not only to make sure that all terminals are defined, but allows defining a language without ever having to manually define token ids for any terminal symbol! X The default scanner (located in -lwacco) maintains its own I/O file pointer. This is so that user code can implement the equivalent of "#include" without too much work. The functions in the scanner include: X X int w_openfile(char *fname) // open a file to the specified name X X void w_closefile() // close the last opened file X X void w_setfile(FILE *f) // set the current file to this X X FILE *w_getfile() // return the currently opened file X X int w_currcol() // the current column in the input X X int w_currline() // the current line in the input X X char *w_getcurrline() // the text of the current line X X int w_input() // basic I/O routines which are X int w_unput(int c) // to be used by the scanner X void w_output(int c) X X You should call either w_setfile() or w_openfile() before starting the parse or the default scanner will probably dump core. X X The functions that the parser expects to have available are: X X int w_gettoken() // get the next token - usually calls yylex() X // - must return EOI on end-of-input X X int w_scanerr() // printf-type error printing routine X // - must always returns RETERR X // - is called with a NULL argument X // when just skipping a token in the input X These are either are in the wacco library -lwacco, or must be provided by the user. X The default w_scanerr() will try to print the line that had the error, and underneath it print "^" where the error occurred and "*" where tokens were skipped when re-syncing. Because of some lex(1) funnies, this doesn't always work as expected. When I do away with the need for lex, this won't be a problem anymore. X X Some other convenient functions defined in parser.C include: X X int w_nexttoken() // return the value of the next token but don't X // scan it yet - calls gettoken() at most once X // - useful for token lookahead X X void w_skiptoken() // scan the current token - the next call to X // nexttoken() will actually read another token X X char *w_tokenname(int tokid) // return the string name of a token X // whose id is tokid X These are only really useful if you are writing your own scanner instead of using lex. Nexttoken() and skiptoken() can also be used to somewhat direct the parse. If you provide your own infinite push-back stack of tokens, you can completely alter the parse at run-time! X The program flex(1) may be used instead of lex(1) if desired, and is highly recommended. X The extern for "yytext" is automatically declared in parser.C. Unfortunately, it may be wrong for the scanner generator actually being used. To change the definition, the macro YYTEXT_DECL may be redefined at the top of your wacco grammer if you wish to use flex: X X { X #undef YYTEXT_DECL X #define YYTEXT_DECL char *yytext X } X ... X X X My original plan was to write a scanner-generator directly into wacco, but since flex(1) is now available, which is very fast and generates excellent scanners, I now have no plans to do anything to the scanning parts of wacco. X X X X -- Parag Patel X X X ================= E X A M P L E G R A M M E R ================== X // This is the usual required calculator sample. It can still use // a LOT of work, but it illustrates the basics. Note that the // precedence of operators is all wrong. X { #include <stdio.h> #include <stdlib.h> } X calc X : %{ X while (w_nexttoken() != EOI) { X %} X expr ([] | '=' | ';' | ',') X %{ X printf("%f\n", $expr); X } X %} X | [] X ; X expr<double> X : term { $binop_expr = $term; } binop_expr { $$ = $binop_expr; } X ; X binop_expr<double> X : '+' expr { $$ += $expr; } X | '-' expr { $$ -= $expr; } X | '*' expr { $$ *= $expr; } X | '/' expr { $$ /= $expr; } X | '&' expr { $$ = (int)$$ & (int)$expr; } X | '|' expr { $$ = (int)$$ | (int)$expr; } X | '^' expr { $$ = (int)$$ ^ (int)$expr; } X | "<<" expr { $$ = (int)$$ << (int)$expr; } X | ">>" expr { $$ = (int)$$ >> (int)$expr; } X | "&&" expr { $$ = $$ && $expr; } X | "||" expr { $$ = $$ || $expr; } X | [] X ; X term<double> X : DOUBLE { $$ = atof((char *)yytext); } X | '-' expr { $$ = -$expr; } X | '~' expr { $$ = ~(int)$expr; } X | '!' expr { $$ = !$expr; } X | '(' expr ')' { $$ = $expr; } X ; X { X main() X { X w_setfile(stdin); X calc(); X } } X $$ X D [0-9] L [_A-Za-z] X %% X "." { return (int)EOI; } X $DOUBLE ({D}+)|({D}+\.{D}+)|({D}+[Ee]-?{D}+)|({D}+\.{D}+[Ee]-?{D}+) X "#".*$ ; X [ \t\v\n\f] ; . { w_scanerr("Illegal character %d ($c)", yytext[0], yytext[0]); } SHAR_EOF chmod 0444 wacco.doc || echo 'restore of wacco.doc failed' Wc_c="`wc -c < 'wacco.doc'`" test 15987 -eq "$Wc_c" || echo 'wacco.doc: original size 15987, current size' "$Wc_c" fi true || echo 'restore of wacco.doc.iw failed' echo End of part 1, continue with part 2 exit 0 exit 0 # Just in case... -- Kent Landfield INTERNET: kent@sparky.IMD.Sterling.COM Sterling Software, IMD UUCP: uunet!sparky!kent Phone: (402) 291-8300 FAX: (402) 291-4362 Please send comp.sources.misc-related mail to kent@uunet.uu.net.