[comp.sources.unix] v19i058: Flex, a fast LEX replacement, Part04/07

rsalz@uunet.uu.net (Rich Salz) (06/23/89)

Submitted-by: Vern Paxson <vern@csam.lbl.gov>
Posting-number: Volume 19, Issue 58
Archive-name: flex2/part04

#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of archive 4 (of 7)."
# Contents:  flex/flex.1 flex/flexdef.h
# Wrapped by rsalz@prune.bbn.com on Thu Jun 22 19:01:47 1989
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'flex/flex.1' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'flex/flex.1'\"
else
echo shar: Extracting \"'flex/flex.1'\" \(20602 characters\)
sed "s/^X//" >'flex/flex.1' <<'END_OF_FILE'
X.TH FLEX 1 "20 June 1989" "Version 2.1"
X.SH NAME
Xflex - fast lexical analyzer generator
X.SH SYNOPSIS
X.B flex
X[
X.B -bdfipstvFILT -c[efmF] -Sskeleton_file
X] [ 
X.I filename
X]
X.SH DESCRIPTION
X.I flex
Xis a rewrite of
X.I lex
Xintended to right some of that tool's deficiencies: in particular,
X.I flex
Xgenerates lexical analyzers much faster, and the analyzers use
Xsmaller tables and run faster.
X.SH OPTIONS
XIn addition to lex's
X.B -t
Xflag, flex has the following options:
X.TP
X.B -b
XGenerate backtracking information to
X.I lex.backtrack.
XThis is a list of scanner states which require backtracking
Xand the input characters on which they do so.  By adding rules one
Xcan remove backtracking states.  If all backtracking states
Xare eliminated and
X.B -f
Xor
X.B -F
Xis used, the generated scanner will run faster (see the
X.B -p
Xflag).  Only users who wish to squeeze every last cycle out of their
Xscanners need worry about this option.
X.TP
X.B -d
Xmakes the generated scanner run in
X.I debug
Xmode.  Whenever a pattern is recognized the scanner will
Xwrite to
X.I stderr
Xa line of the form:
X.nf
X
X    --accepting rule #n
X
X.fi
XRules are numbered sequentially with the first one being 1.  Rule #0
Xis executed when the scanner backtracks; Rule #(n+1) (where
X.I n
Xis the number of rules) indicates the default action; Rule #(n+2) indicates
Xthat the input buffer is empty and needs to be refilled and then the scan
Xrestarted.  Rules beyond (n+2) are end-of-file actions.
X.TP
X.B -f
Xhas the same effect as lex's -f flag (do not compress the scanner
Xtables); the mnemonic changes from
X.I fast compilation
Xto (take your pick)
X.I full table
Xor
X.I fast scanner.
XThe actual compilation takes
X.I longer,
Xsince flex is I/O bound writing out the big table.
X.IP
XThis option is equivalent to
X.B -cf
X(see below).
X.TP
X.B -i
Xinstructs flex to generate a
X.I case-insensitive
Xscanner.  The case of letters given in the flex input patterns will
Xbe ignored, and the rules will be matched regardless of case.  The
Xmatched text given in
X.I yytext
Xwill have the preserved case (i.e., it will not be folded).
X.TP
X.B -p
Xgenerates a performance report to stderr.  The report
Xconsists of comments regarding features of the flex input file
Xwhich will cause a loss of performance in the resulting scanner.
XNote that the use of
X.I REJECT
Xand variable trailing context (see
X.B BUGS)
Xentails a substantial performance penalty; use of
X.I yymore(),
Xthe
X.B ^
Xoperator,
Xand the
X.B -I
Xflag entail minor performance penalties.
X.TP
X.B -s
Xcauses the
X.I default rule
X(that unmatched scanner input is echoed to
X.I stdout)
Xto be suppressed.  If the scanner encounters input that does not
Xmatch any of its rules, it aborts with an error.  This option is
Xuseful for finding holes in a scanner's rule set.
X.TP
X.B -v
Xhas the same meaning as for lex (print to
X.I stderr
Xa summary of statistics of the generated scanner).  Many more statistics
Xare printed, though, and the summary spans several lines.  Most
Xof the statistics are meaningless to the casual flex user, but the
Xfirst line identifies the version of flex, which is useful for figuring
Xout where you stand with respect to patches and new releases.
X.TP
X.B -F
Xspecifies that the
X.ul
Xfast
Xscanner table representation should be used.  This representation is
Xabout as fast as the full table representation
X.ul
X(-f),
Xand for some sets of patterns will be considerably smaller (and for
Xothers, larger).  In general, if the pattern set contains both "keywords"
Xand a catch-all, "identifier" rule, such as in the set:
X.nf
X
X	"case"    return ( TOK_CASE );
X	"switch"  return ( TOK_SWITCH );
X	...
X	"default" return ( TOK_DEFAULT );
X	[a-z]+    return ( TOK_ID );
X
X.fi
Xthen you're better off using the full table representation.  If only
Xthe "identifier" rule is present and you then use a hash table or some such
Xto detect the keywords, you're better off using
X.ul
X-F.
X.IP
XThis option is equivalent to
X.B -cF
X(see below).
X.TP
X.B -I
Xinstructs flex to generate an
X.I interactive
Xscanner.  Normally, scanners generated by flex always look ahead one
Xcharacter before deciding that a rule has been matched.  At the cost of
Xsome scanning overhead, flex will generate a scanner which only looks ahead
Xwhen needed.  Such scanners are called
X.I interactive
Xbecause if you want to write a scanner for an interactive system such as a
Xcommand shell, you will probably want the user's input to be terminated
Xwith a newline, and without
X.B -I
Xthe user will have to type a character in addition to the newline in order
Xto have the newline recognized.  This leads to dreadful interactive
Xperformance.
X.IP
XIf all this seems to confusing, here's the general rule: if a human will
Xbe typing in input to your scanner, use
X.B -I,
Xotherwise don't; if you don't care about how fast your scanners run and
Xdon't want to make any assumptions about the input to your scanner,
Xalways use
X.B -I.
X.IP
XNote,
X.B -I
Xcannot be used in conjunction with
X.I full
Xor
X.I fast tables,
Xi.e., the
X.B -f, -F, -cf,
Xor
X.B -cF
Xflags.
X.TP
X.B -L
Xinstructs flex to not generate
X.B #line
Xdirectives (see below).
X.TP
X.B -T
Xmakes flex run in
X.I trace
Xmode.  It will generate a lot of messages to stdout concerning
Xthe form of the input and the resultant non-deterministic and deterministic
Xfinite automatons.  This option is mostly for use in maintaining flex.
X.TP 
X.B -c[efmF]
Xcontrols the degree of table compression.
X.B -ce
Xdirects flex to construct
X.I equivalence classes,
Xi.e., sets of characters
Xwhich have identical lexical properties (for example, if the only
Xappearance of digits in the flex input is in the character class
X"[0-9]" then the digits '0', '1', ..., '9' will all be put
Xin the same equivalence class).
X.B -cf
Xspecifies that the
X.I full
Xscanner tables should be generated - flex should not compress the
Xtables by taking advantages of similar transition functions for
Xdifferent states.
X.B -cF
Xspecifies that the alternate fast scanner representation (described
Xabove under the
X.B -F
Xflag)
Xshould be used.
X.B -cm
Xdirects flex to construct
X.I meta-equivalence classes,
Xwhich are sets of equivalence classes (or characters, if equivalence
Xclasses are not being used) that are commonly used together.
XA lone
X.B -c
Xspecifies that the scanner tables should be compressed but neither
Xequivalence classes nor meta-equivalence classes should be used.
X.IP
XThe options
X.B -cf
Xor
X.B -cF
Xand
X.B -cm
Xdo not make sense together - there is no opportunity for meta-equivalence
Xclasses if the table is not being compressed.  Otherwise the options
Xmay be freely mixed.
X.IP
XThe default setting is
X.B -cem
Xwhich specifies that flex should generate equivalence classes
Xand meta-equivalence classes.  This setting provides the highest
Xdegree of table compression.  You can trade off
Xfaster-executing scanners at the cost of larger tables with
Xthe following generally being true:
X.nf
X
X    slowest            smallest
X               -cem
X               -ce
X               -cm
X               -c
X               -c{f,F}e
X               -c{f,F}
X    fastest            largest
X
X.fi
XNote that scanners with the smallest tables compile the quickest, so
Xduring development you will usually want to use the default, maximal
Xcompression.
X.TP
X.B -Sskeleton_file
Xoverrides the default skeleton file from which flex constructs
Xits scanners.  You'll never need this option unless you are doing
Xflex maintenance or development.
X.SH INCOMPATIBILITIES WITH LEX
X.I flex
Xis fully compatible with
X.I lex
Xwith the following exceptions:
X.IP -
XThere is no run-time library to link with.  You needn't
Xspecify
X.I -ll
Xwhen linking, and you must supply a main program.  (Hacker's note: since
Xthe lex library contains a main() which simply calls yylex(), you actually
X.I can
Xbe lazy and not supply your own main program and link with
X.I -ll.)
X.IP -
Xlex's
X.B %r
X(Ratfor scanners) and
X.B %t
X(translation table) options
Xare not supported.
X.IP -
XThe do-nothing
X.ul
X-n
Xflag is not supported.
X.IP -
XWhen definitions are expanded, flex encloses them in parentheses.
XWith lex, the following
X.nf
X
X    NAME    [A-Z][A-Z0-9]*
X    %%
X    foo{NAME}?      printf( "Found it\\n" );
X    %%
X
X.fi
Xwill not match the string "foo" because when the macro
Xis expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
Xand the precedence is such that the '?' is associated with
X"[A-Z0-9]*".  With flex, the rule will be expanded to
X"foo([A-z][A-Z0-9]*)?" and so the string "foo" will match.
XNote that because of this, the
X.B ^, $, <s>,
Xand
X.B /
Xoperators cannot be used in a definition.
X.IP -
XThe undocumented lex-scanner internal variable
X.B yylineno
Xis not supported.
X.IP -
XThe
X.B input()
Xroutine is not redefinable, though may be called to read characters
Xfollowing whatever has been matched by a rule.  If
X.B input()
Xencounters an end-of-file the normal
X.B yywrap()
Xprocessing is done.  A ``real'' end-of-file is returned as
X.I EOF.
X.IP
XInput can be controlled by redefining the
X.B YY_INPUT
Xmacro.
XYY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)".  Its
Xaction is to place up to max_size characters in the character buffer "buf"
Xand return in the integer variable "result" either the
Xnumber of characters read or the constant YY_NULL (0 on Unix systems)
Xsystems) to indicate EOF.  The default YY_INPUT reads from the
Xfile-pointer "yyin" (which is by default
X.I stdin),
Xso if you
Xjust want to change the input file, you needn't redefine
XYY_INPUT - just point yyin at the input file.
X.IP
XA sample redefinition of YY_INPUT (in the first section of the input
Xfile):
X.nf
X
X    %{
X    #undef YY_INPUT
X    #define YY_INPUT(buf,result,max_size) \\
X        result = (buf[0] = getchar()) == EOF ? YY_NULL : 1;
X    %}
X
X.fi
XYou also can add in things like counting keeping track of the
Xinput line number this way; but don't expect your scanner to
Xgo very fast.
X.IP -
X.B output()
Xis not supported.
XOutput from the ECHO macro is done to the file-pointer
X"yyout" (default
X.I stdout).
X.IP -
XIf you are providing your own yywrap() routine, you must "#undef yywrap"
Xfirst.
X.IP -
XTo refer to yytext outside of your scanner source file, use
X"extern char *yytext;" rather than "extern char yytext[];".
X.IP -
X.B yyleng
Xis a macro and not a variable, and hence cannot be accessed outside
Xof the scanner source file.
X.IP -
Xflex reads only one input file, while lex's input is made
Xup of the concatenation of its input files.
X.IP -
XThe name
X.bd
XFLEX_SCANNER
Xis #define'd so scanners may be written for use with either
Xflex or lex.
X.IP -
XThe macro
X.bd
XYY_USER_ACTION
Xcan be redefined to provide an action
Xwhich is always executed prior to the matched rule's action.  For example,
Xit could be #define'd to call a routine to convert yytext to lower-case,
Xor to copy yyleng to a global variable to make it accessible outside of
Xthe scanner source file.
X.IP -
XIn the generated scanner, rules are separated using
X.bd
XYY_BREAK
Xinstead of simple "break"'s.  This allows, for example, C++ users to
X#define YY_BREAK to do nothing (while being very careful that every
Xrule ends with a "break" or a "return"!) to avoid suffering from
Xunreachable statement warnings where a rule's action ends with "return".
X.SH ENHANCEMENTS
X.IP -
X.I Exclusive start-conditions
Xcan be declared by using
X.B %x
Xinstead of
X.B %s.
XThese start-conditions have the property that when they are active,
X.I no other rules are active.
XThus a set of rules governed by the same exclusive start condition
Xdescribe a scanner which is independent of any of the other rules in
Xthe flex input.  This feature makes it easy to specify "mini-scanners"
Xwhich scan portions of the input that are syntactically different
Xfrom the rest (e.g., comments).
X.IP -
X.I yyterminate()
Xcan be used in lieu of a return statement in an action.  It terminates
Xthe scanner and returns a 0 to the scanner's caller, indicating "all done".
X.IP -
X.I End-of-file rules.
XThe special rule "<<EOF>>" indicates
Xactions which are to be taken when an end-of-file is
Xencountered and yywrap() returns non-zero (i.e., indicates
Xno further files to process).  The action can either
Xpoint yyin at a new file to process, in which case the
Xaction should finish with
X.I YY_NEW_FILE
X(this is a branch, so subsequent code in the action won't
Xbe executed), or it should finish with a
X.I return
Xstatement.  <<EOF>> rules may not be used with other
Xpatterns; they may only be qualified with a list of start
Xconditions.  If an unqualified <<EOF>> rule is given, it
Xapplies only to the INITIAL start condition, and
X.I not
Xto
X.B %s
Xstart conditions.
XThese rules are useful for catching things like unclosed comments.
XAn example:
X.nf
X
X    %x quote
X    %%
X    ...
X    <quote><<EOF>>   {
X	     error( "unterminated quote" );
X	     yyterminate();
X	     }
X    <<EOF>>          {
X	     yyin = fopen( next_file, "r" );
X	     YY_NEW_FILE;
X	     }
X
X.fi
X.IP -
Xflex dynamically resizes its internal tables, so directives like "%a 3000"
Xare not needed when specifying large scanners.
X.IP -
XThe scanning routine generated by flex is declared using the macro
X.B YY_DECL.
XBy redefining this macro you can change the routine's name and
Xits calling sequence.  For example, you could use:
X.nf
X
X    #undef YY_DECL
X    #define YY_DECL float lexscan( a, b ) float a, b;
X
X.fi
Xto give it the name
X.I lexscan,
Xreturning a float, and taking two floats as arguments.  Note that
Xif you give arguments to the scanning routine, you must terminate
Xthe definition with a semi-colon (;).
X.IP -
Xflex generates
X.B #line
Xdirectives mapping lines in the output to
Xtheir origin in the input file.
X.IP -
XYou can put multiple actions on the same line, separated with
Xsemi-colons.  With lex, the following
X.nf
X
X    foo    handle_foo(); return 1;
X
X.fi
Xis truncated to
X.nf
X
X    foo    handle_foo();
X
X.fi
Xflex does not truncate the action.  Actions that are not enclosed in
Xbraces are terminated at the end of the line.
X.IP -
XActions can be begun with
X.B %{
Xand terminated with
X.B %}.
XIn this case, flex does not count braces to figure out where the
Xaction ends - actions are terminated by the closing
X.B %}.
XThis feature is useful when the enclosed action has extraneous
Xbraces in it (usually in comments or inside inactive #ifdef's)
Xthat throw off the brace-count.
X.IP -
XAll of the scanner actions (e.g.,
X.B ECHO, yywrap ...)
Xexcept the
X.B unput()
Xand
X.B input()
Xroutines,
Xare written as macros, so they can be redefined if necessary
Xwithout requiring a separate library to link to.
X.IP -
XWhen
X.B yywrap()
Xindicates that the scanner is done processing (it does this by returning
Xnon-zero), on subsequent calls the scanner will always immediately return
Xa value of 0.  To restart it on a new input file, the action
X.B yyrestart()
Xis used.  It takes one argument, the new input file.  It closes the
Xprevious yyin (unless stdin) and sets up the scanners internal variables
Xso that the next call to yylex() will start scanning the new file.  This
Xfunctionality is useful for, e.g., programs which will process a file, do some
Xwork, and then get a message to parse another file.
X.IP -
XFlex scans the code in section 1 (inside %{}'s) and the actions for
Xoccurrences of
X.I REJECT
Xand
X.I yymore().
XIf it doesn't see any, it assumes the features are not used and generates
Xhigher-performance scanners.  Flex tries to be correct in identifying
Xuses but can be fooled (for example, if a reference is made in a macro from
Xa #include file).  If this happens (a feature is used and flex didn't
Xrealize it) you will get a compile-time error of the form
X.nf
X
X    reject_used_but_not_detected undefined
X
X.fi
XYou can tell flex that a feature is used even if it doesn't think so
Xwith
X.B %used
Xfollowed by the name of the feature (for example, "%used REJECT");
Xsimilarly, you can specify that a feature is
X.I not
Xused even though it thinks it is with
X.B %unused.
X.IP -
XComments may be put in the first section of the input by preceding
Xthem with '#'.
X.SH FILES
X.TP
X.I flex.skel
Xskeleton scanner
X.TP
X.I lex.yy.c
Xgenerated scanner (called
X.I lexyy.c
Xon some systems).
X.TP
X.I lex.backtrack
Xbacktracking information for
X.B -b
Xflag (called
X.I lex.bck
Xon some systems).
X.SH "SEE ALSO"
X.LP
Xlex(1)
X.LP
XM. E. Lesk and E. Schmidt,
X.I LEX - Lexical Analyzer Generator
X.SH AUTHOR
XVern Paxson, with the help of many ideas and much inspiration from
XVan Jacobson.  Original version by Jef Poskanzer.  Fast table
Xrepresentation is a partial implementation of a design done by Van
XJacobson.  The implementation was done by Kevin Gong and Vern Paxson.
X.LP
XThanks to the many flex beta-testers and feedbackers, especially Casey
XLeedom, Frederic Brehm, Nick Christopher, Chris Faylor, Eric Goldman, Eric
XHughes, Greg Lee, Craig Leres, Mohamed el Lozy, Jim Meyering, Esmond Pitt,
XJef Poskanzer, and Dave Tallman.  Thanks to Keith Bostic, John Gilmore, Bob
XMulcahy, Rich Salz, and Richard Stallman for help with various distribution
Xheadaches.
X.LP
XSend comments to:
X.nf
X
X     Vern Paxson
X     Real Time Systems
X     Bldg. 46A
X     Lawrence Berkeley Laboratory
X     1 Cyclotron Rd.
X     Berkeley, CA 94720
X
X     (415) 486-6411
X
X     vern@csam.lbl.gov
X     vern@rtsg.ee.lbl.gov
X     ucbvax!csam.lbl.gov!vern
X
X.fi
XI will be gone from mid-July '89 through mid-August '89.  From August on,
Xthe addresses are:
X.nf
X
X     vern@cs.cornell.edu
X
X     Vern Paxson
X     CS Department
X     Grad Office
X     4126 Upson
X     Cornell University
X     Ithaca, NY 14853-7501
X
X     <no phone number yet>
X
X.fi
XEmail sent to the former addresses should continue to be forwarded for
Xquite a while.  Also, it looks like my username will be "paxson" and
Xnot "vern".  I'm planning on having a mail alias set up so "vern" will
Xstill work, but if you encounter problems try "paxson".
X.SH DIAGNOSTICS
X.LP
X.I flex scanner jammed -
Xa scanner compiled with
X.B -s
Xhas encountered an input string which wasn't matched by
Xany of its rules.
X.LP
X.I flex input buffer overflowed -
Xa scanner rule matched a string long enough to overflow the
Xscanner's internal input buffer (16K bytes - controlled by
X.B YY_BUF_MAX
Xin "flex.skel").
X.LP
X.I old-style lex command ignored -
Xthe flex input contains a lex command (e.g., "%n 1000") which
Xis being ignored.
X.SH BUGS
X.LP
XSome trailing context
Xpatterns cannot be properly matched and generate
Xwarning messages ("Dangerous trailing context").  These are
Xpatterns where the ending of the
Xfirst part of the rule matches the beginning of the second
Xpart, such as "zx*/xy*", where the 'x*' matches the 'x' at
Xthe beginning of the trailing context.  (Lex doesn't get these
Xpatterns right either.)
XIf desperate, you can use
X.B yyless()
Xto effect arbitrary trailing context.
X.LP
X.I variable
Xtrailing context (where both the leading and trailing parts do not have
Xa fixed length) entails the same performance loss as
X.I REJECT
X(i.e., substantial).
X.LP
XFor some trailing context rules, parts which are actually fixed-length are
Xnot recognized as such, leading to the abovementioned performance loss.
XIn particular, parts using '|' or {n} are always considered variable-length.
X.LP
XUse of unput() or input() trashes the current yytext and yyleng.
X.LP
XUse of unput() to push back more text than was matched can
Xresult in the pushed-back text matching a beginning-of-line ('^')
Xrule even though it didn't come at the beginning of the line.
X.LP
Xyytext and yyleng cannot be modified within a flex action.
X.LP
XNulls are not allowed in flex inputs or in the inputs to
Xscanners generated by flex.  Their presence generates fatal
Xerrors.
X.LP
XFlex does not generate correct #line directives for code internal
Xto the scanner; thus, bugs in
X.I
Xflex.skel
Xyield bogus line numbers.
X.LP
XPushing back definitions enclosed in ()'s can result in nasty,
Xdifficult-to-understand problems like:
X.nf
X
X	{DIG}  [0-9] /* a digit */
X
X.fi
XIn which the pushed-back text is "([0-9] /* a digit */)".
X.LP
XDue to both buffering of input and read-ahead, you cannot intermix
Xcalls to stdio routines, such as, for example,
X.B getchar()
Xwith flex rules and expect it to work.  Call
X.B input()
Xinstead.
X.LP
XThe total table entries listed by the
X.B -v
Xflag excludes the number of table entries needed to determine
Xwhat rule has been matched.  The number of entries is equal
Xto the number of DFA states if the scanner does not use REJECT,
Xand somewhat greater than the number of states if it does.
X.LP
XTo be consistent with ANSI C, the escape sequence \\xhh should
Xbe recognized for hexadecimal escape sequences, such as '\\x41' for 'A'.
X.LP
XIt would be useful if flex wrote to lex.yy.c a summary of the flags used in
Xits generation (such as which table compression options).
X.LP
XThe scanner run-time speeds still have not been optimized as much
Xas they deserve.  Van Jacobson's work shows that the can go
Xfaster still.
X.LP
XThe utility needs more complete documentation.
END_OF_FILE
if test 20602 -ne `wc -c <'flex/flex.1'`; then
    echo shar: \"'flex/flex.1'\" unpacked with wrong size!
fi
# end of 'flex/flex.1'
fi
if test -f 'flex/flexdef.h' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'flex/flexdef.h'\"
else
echo shar: Extracting \"'flex/flexdef.h'\" \(21874 characters\)
sed "s/^X//" >'flex/flexdef.h' <<'END_OF_FILE'
X/* flexdef - definitions file for flex */
X
X/*
X * Copyright (c) 1989 The Regents of the University of California.
X * All rights reserved.
X *
X * This code is derived from software contributed to Berkeley by
X * Vern Paxson.
X * 
X * The United States Government has rights in this work pursuant to
X * contract no. DE-AC03-76SF00098 between the United States Department of
X * Energy and the University of California.
X *
X * Redistribution and use in source and binary forms are permitted
X * provided that the above copyright notice and this paragraph are
X * duplicated in all such forms and that any documentation,
X * advertising materials, and other materials related to such
X * distribution and use acknowledge that the software was developed
X * by the University of California, Berkeley.  The name of the
X * University may not be used to endorse or promote products derived
X * from this software without specific prior written permission.
X * THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
X * IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
X * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
X */
X
X/* @(#) $Header: flexdef.h,v 2.0 89/06/20 15:49:50 vern Locked $ (LBL) */
X
X#ifndef FILE
X#include <stdio.h>
X#endif
X
X#ifdef SYS_V
X#include <string.h>
X
X#ifdef AMIGA
X#define bzero(s, n) setmem((char *)(s), (unsigned)(n), '\0')
X#define abs(x) ((x) < 0 ? -(x) : (x))
X#else
X#define bzero(s, n) memset((char *)(s), '\0', (unsigned)(n))
X#endif
X
X#ifndef VMS
Xchar *memset();
X#else
X/* memset is needed for old versions of the VMS C runtime library */
X#define memset(s, c, n) \
X	{ \
X	register char *t = s; \
X	register unsigned int m = n; \
X	while ( m-- > 0 ) \
X	    *t++ = c; \
X	}
X#define unlink delete
X#define SHORT_FILE_NAMES
X#endif
X#endif
X
X#ifndef SYS_V
X#include <strings.h>
X#ifdef lint
Xchar *sprintf(); /* keep lint happy */
X#endif
X#endif
X
X
X/* maximum line length we'll have to deal with */
X#define MAXLINE BUFSIZ
X
X/* maximum size of file name */
X#define FILENAMESIZE 1024
X
X#define min(x,y) ((x) < (y) ? (x) : (y))
X#define max(x,y) ((x) > (y) ? (x) : (y))
X
X#ifdef MS_DOS
X#define abs(x) ((x) < 0 ? -(x) : (x))
X#define SHORT_FILE_NAMES
X#endif
X
X#define true 1
X#define false 0
X
X
X#ifndef DEFAULT_SKELETON_FILE
X#define DEFAULT_SKELETON_FILE "flex.skel"
X#endif
X
X/* special chk[] values marking the slots taking by end-of-buffer and action
X * numbers
X */
X#define EOB_POSITION -1
X#define ACTION_POSITION -2
X
X/* number of data items per line for -f output */
X#define NUMDATAITEMS 10
X
X/* number of lines of data in -f output before inserting a blank line for
X * readability.
X */
X#define NUMDATALINES 10
X
X/* transition_struct_out() definitions */
X#define TRANS_STRUCT_PRINT_LENGTH 15
X
X/* returns true if an nfa state has an epsilon out-transition slot
X * that can be used.  This definition is currently not used.
X */
X#define FREE_EPSILON(state) \
X	(transchar[state] == SYM_EPSILON && \
X	 trans2[state] == NO_TRANSITION && \
X	 finalst[state] != state)
X
X/* returns true if an nfa state has an epsilon out-transition character
X * and both slots are free
X */
X#define SUPER_FREE_EPSILON(state) \
X	(transchar[state] == SYM_EPSILON && \
X	 trans1[state] == NO_TRANSITION) \
X
X/* maximum number of NFA states that can comprise a DFA state.  It's real
X * big because if there's a lot of rules, the initial state will have a
X * huge epsilon closure.
X */
X#define INITIAL_MAX_DFA_SIZE 750
X#define MAX_DFA_SIZE_INCREMENT 750
X
X/* array names to be used in generated machine.  They're short because
X * we write out one data statement (which names the array) for each element
X * in the array.
X */
X
X/* points to list of rules accepted for a state */
X#define ALIST "yy_accept"
X#define ACCEPT "yy_acclist"	/* list of rules accepted for a state */
X#define ECARRAY "yy_ec"	/* maps input characters to equivalence classes */
X/* maps equivalence classes to meta-equivalence classes */
X#define MATCHARRAY "yy_meta"
X#define BASEARRAY "yy_base"	/* "base" array */
X#define DEFARRAY "yy_def"	/* "default" array */
X#define NEXTARRAY "yy_nxt"	/* "next" array */
X#define CHECKARRAY "yy_chk"	/* "check" array */
X
X
X/* a note on the following masks.  They are used to mark accepting numbers
X * as being special.  As such, they implicitly limit the number of accepting
X * numbers (i.e., rules) because if there are too many rules the rule numbers
X * will overload the mask bits.  Fortunately, this limit is \large/ (0x2000 ==
X * 8192) so unlikely to actually cause any problems.  A check is made in
X * new_rule() to ensure that this limit is not reached.
X */
X
X/* mask to mark a trailing context accepting number */
X#define YY_TRAILING_MASK 0x2000
X
X/* mask to mark the accepting number of the "head" of a trailing context rule */
X#define YY_TRAILING_HEAD_MASK 0x4000
X
X/* maximum number of rules, as outlined in the above note */
X#define MAX_RULE (YY_TRAILING_MASK - 1)
X
X
X/* NIL must be 0.  If not, its special meaning when making equivalence classes
X * (it marks the representative of a given e.c.) will be unidentifiable
X */
X#define NIL 0
X
X#define JAM -1	/* to mark a missing DFA transition */
X#define NO_TRANSITION NIL
X#define UNIQUE -1	/* marks a symbol as an e.c. representative */
X#define INFINITY -1	/* for x{5,} constructions */
X
X/* size of input alphabet - should be size of ASCII set */
X#define CSIZE 127
X
X#define INITIAL_MAX_CCLS 100	/* max number of unique character classes */
X#define MAX_CCLS_INCREMENT 100
X
X/* size of table holding members of character classes */
X#define INITIAL_MAX_CCL_TBL_SIZE 500
X#define MAX_CCL_TBL_SIZE_INCREMENT 250
X
X#define INITIAL_MAX_RULES 100	/* default maximum number of rules */
X#define MAX_RULES_INCREMENT 100
X
X#define INITIAL_MNS 2000	/* default maximum number of nfa states */
X#define MNS_INCREMENT 1000	/* amount to bump above by if it's not enough */
X
X#define INITIAL_MAX_DFAS 1000	/* default maximum number of dfa states */
X#define MAX_DFAS_INCREMENT 1000
X
X#define JAMSTATE -32766	/* marks a reference to the state that always jams */
X
X/* enough so that if it's subtracted from an NFA state number, the result
X * is guaranteed to be negative
X */
X#define MARKER_DIFFERENCE 32000
X#define MAXIMUM_MNS 31999
X
X/* maximum number of nxt/chk pairs for non-templates */
X#define INITIAL_MAX_XPAIRS 2000
X#define MAX_XPAIRS_INCREMENT 2000
X
X/* maximum number of nxt/chk pairs needed for templates */
X#define INITIAL_MAX_TEMPLATE_XPAIRS 2500
X#define MAX_TEMPLATE_XPAIRS_INCREMENT 2500
X
X#define SYM_EPSILON 0	/* to mark transitions on the symbol epsilon */
X
X#define INITIAL_MAX_SCS 40	/* maximum number of start conditions */
X#define MAX_SCS_INCREMENT 40	/* amount to bump by if it's not enough */
X
X#define ONE_STACK_SIZE 500	/* stack of states with only one out-transition */
X#define SAME_TRANS -1	/* transition is the same as "default" entry for state */
X
X/* the following percentages are used to tune table compression:
X
X * the percentage the number of out-transitions a state must be of the
X * number of equivalence classes in order to be considered for table
X * compaction by using protos
X */
X#define PROTO_SIZE_PERCENTAGE 15
X
X/* the percentage the number of homogeneous out-transitions of a state
X * must be of the number of total out-transitions of the state in order
X * that the state's transition table is first compared with a potential 
X * template of the most common out-transition instead of with the first
X * proto in the proto queue
X */
X#define CHECK_COM_PERCENTAGE 50
X
X/* the percentage the number of differences between a state's transition
X * table and the proto it was first compared with must be of the total
X * number of out-transitions of the state in order to keep the first
X * proto as a good match and not search any further
X */
X#define FIRST_MATCH_DIFF_PERCENTAGE 10
X
X/* the percentage the number of differences between a state's transition
X * table and the most similar proto must be of the state's total number
X * of out-transitions to use the proto as an acceptable close match
X */
X#define ACCEPTABLE_DIFF_PERCENTAGE 50
X
X/* the percentage the number of homogeneous out-transitions of a state
X * must be of the number of total out-transitions of the state in order
X * to consider making a template from the state
X */
X#define TEMPLATE_SAME_PERCENTAGE 60
X
X/* the percentage the number of differences between a state's transition
X * table and the most similar proto must be of the state's total number
X * of out-transitions to create a new proto from the state
X */
X#define NEW_PROTO_DIFF_PERCENTAGE 20
X
X/* the percentage the total number of out-transitions of a state must be
X * of the number of equivalence classes in order to consider trying to
X * fit the transition table into "holes" inside the nxt/chk table.
X */
X#define INTERIOR_FIT_PERCENTAGE 15
X
X/* size of region set aside to cache the complete transition table of
X * protos on the proto queue to enable quick comparisons
X */
X#define PROT_SAVE_SIZE 2000
X
X#define MSP 50	/* maximum number of saved protos (protos on the proto queue) */
X
X/* maximum number of out-transitions a state can have that we'll rummage
X * around through the interior of the internal fast table looking for a
X * spot for it
X */
X#define MAX_XTIONS_FULL_INTERIOR_FIT 4
X
X/* maximum number of rules which will be reported as being associated
X * with a DFA state
X */
X#define MAX_ASSOC_RULES 100
X
X/* number that, if used to subscript an array, has a good chance of producing
X * an error; should be small enough to fit into a short
X */
X#define BAD_SUBSCRIPT -32767
X
X/* absolute value of largest number that can be stored in a short, with a
X * bit of slop thrown in for general paranoia.
X */
X#define MAX_SHORT 32766
X
X
X/* Declarations for global variables. */
X
X/* variables for symbol tables:
X * sctbl - start-condition symbol table
X * ndtbl - name-definition symbol table
X * ccltab - character class text symbol table
X */
X
Xstruct hash_entry
X    {
X    struct hash_entry *prev, *next;
X    char *name;
X    char *str_val;
X    int int_val;
X    } ;
X
Xtypedef struct hash_entry *hash_table[];
X
X#define NAME_TABLE_HASH_SIZE 101
X#define START_COND_HASH_SIZE 101
X#define CCL_HASH_SIZE 101
X
Xextern struct hash_entry *ndtbl[NAME_TABLE_HASH_SIZE]; 
Xextern struct hash_entry *sctbl[START_COND_HASH_SIZE];
Xextern struct hash_entry *ccltab[CCL_HASH_SIZE];
X
X
X/* variables for flags:
X * printstats - if true (-v), dump statistics
X * syntaxerror - true if a syntax error has been found
X * eofseen - true if we've seen an eof in the input file
X * ddebug - if true (-d), make a "debug" scanner
X * trace - if true (-T), trace processing
X * spprdflt - if true (-s), suppress the default rule
X * interactive - if true (-I), generate an interactive scanner
X * caseins - if true (-i), generate a case-insensitive scanner
X * useecs - if true (-ce flag), use equivalence classes
X * fulltbl - if true (-cf flag), don't compress the DFA state table
X * usemecs - if true (-cm flag), use meta-equivalence classes
X * fullspd - if true (-F flag), use Jacobson method of table representation
X * gen_line_dirs - if true (i.e., no -L flag), generate #line directives
X * performance_report - if true (i.e., -p flag), generate a report relating
X *   to scanner performance
X * backtrack_report - if true (i.e., -b flag), generate "lex.backtrack" file
X *   listing backtracking states
X * yymore_used - if true, yymore() is used in input rules
X * reject - if true, generate backtracking tables for REJECT macro
X * real_reject - if true, scanner really uses REJECT (as opposed to just
X *               having "reject" set for variable trailing context)
X * continued_action - true if this rule's action is to "fall through" to
X *                    the next rule's action (i.e., the '|' action)
X * yymore_really_used - has a REALLY_xxx value indicating whether a
X *                      %used or %notused was used with yymore()
X * reject_really_used - same for REJECT
X */
X
Xextern int printstats, syntaxerror, eofseen, ddebug, trace, spprdflt;
Xextern int interactive, caseins, useecs, fulltbl, usemecs;
Xextern int fullspd, gen_line_dirs, performance_report, backtrack_report;
Xextern int yymore_used, reject, real_reject, continued_action;
X
X#define REALLY_NOT_DETERMINED 0
X#define REALLY_USED 1
X#define REALLY_NOT_USED 2
Xextern int yymore_really_used, reject_really_used;
X
X
X/* variables used in the flex input routines:
X * datapos - characters on current output line
X * dataline - number of contiguous lines of data in current data
X *    statement.  Used to generate readable -f output
X * skelfile - the skeleton file
X * yyin - input file
X * temp_action_file - temporary file to hold actions
X * backtrack_file - file to summarize backtracking states to
X * action_file_name - name of the temporary file
X * infilename - name of input file
X * linenum - current input line number
X */
X
Xextern int datapos, dataline, linenum;
Xextern FILE *skelfile, *yyin, *temp_action_file, *backtrack_file;
Xextern char *infilename;
Xextern char action_file_name[];
X
X
X/* variables for stack of states having only one out-transition:
X * onestate - state number
X * onesym - transition symbol
X * onenext - target state
X * onedef - default base entry
X * onesp - stack pointer
X */
X
Xextern int onestate[ONE_STACK_SIZE], onesym[ONE_STACK_SIZE];
Xextern int onenext[ONE_STACK_SIZE], onedef[ONE_STACK_SIZE], onesp;
X
X
X/* variables for nfa machine data:
X * current_mns - current maximum on number of NFA states
X * num_rules - number of the last accepting state; also is number of
X *             rules created so far
X * current_max_rules - current maximum number of rules
X * lastnfa - last nfa state number created
X * firstst - physically the first state of a fragment
X * lastst - last physical state of fragment
X * finalst - last logical state of fragment
X * transchar - transition character
X * trans1 - transition state
X * trans2 - 2nd transition state for epsilons
X * accptnum - accepting number
X * assoc_rule - rule associated with this NFA state (or 0 if none)
X * state_type - a STATE_xxx type identifying whether the state is part
X *              of a normal rule, the leading state in a trailing context
X *              rule (i.e., the state which marks the transition from
X *              recognizing the text-to-be-matched to the beginning of
X *              the trailing context), or a subsequent state in a trailing
X *              context rule
X * rule_type - a RULE_xxx type identifying whether this a a ho-hum
X *             normal rule or one which has variable head & trailing
X *             context
X * rule_linenum - line number associated with rule
X */
X
Xextern int current_mns, num_rules, current_max_rules, lastnfa;
Xextern int *firstst, *lastst, *finalst, *transchar, *trans1, *trans2;
Xextern int *accptnum, *assoc_rule, *state_type, *rule_type, *rule_linenum;
X
X/* different types of states; values are useful as masks, as well, for
X * routines like check_trailing_context()
X */
X#define STATE_NORMAL 0x1
X#define STATE_TRAILING_CONTEXT 0x2
X
X/* global holding current type of state we're making */
X
Xextern int current_state_type;
X
X/* different types of rules */
X#define RULE_NORMAL 0
X#define RULE_VARIABLE 1
X
X/* true if the input rules include a rule with both variable-length head
X * and trailing context, false otherwise
X */
Xextern int variable_trailing_context_rules;
X
X
X/* variables for protos:
X * numtemps - number of templates created
X * numprots - number of protos created
X * protprev - backlink to a more-recently used proto
X * protnext - forward link to a less-recently used proto
X * prottbl - base/def table entry for proto
X * protcomst - common state of proto
X * firstprot - number of the most recently used proto
X * lastprot - number of the least recently used proto
X * protsave contains the entire state array for protos
X */
X
Xextern int numtemps, numprots, protprev[MSP], protnext[MSP], prottbl[MSP];
Xextern int protcomst[MSP], firstprot, lastprot, protsave[PROT_SAVE_SIZE];
X
X
X/* variables for managing equivalence classes:
X * numecs - number of equivalence classes
X * nextecm - forward link of Equivalence Class members
X * ecgroup - class number or backward link of EC members
X * nummecs - number of meta-equivalence classes (used to compress
X *   templates)
X * tecfwd - forward link of meta-equivalence classes members
X * tecbck - backward link of MEC's
X */
X
Xextern int numecs, nextecm[CSIZE + 1], ecgroup[CSIZE + 1], nummecs;
Xextern int tecfwd[CSIZE + 1], tecbck[CSIZE + 1];
X
X
X/* variables for start conditions:
X * lastsc - last start condition created
X * current_max_scs - current limit on number of start conditions
X * scset - set of rules active in start condition
X * scbol - set of rules active only at the beginning of line in a s.c.
X * scxclu - true if start condition is exclusive
X * sceof - true if start condition has EOF rule
X * scname - start condition name
X * actvsc - stack of active start conditions for the current rule
X */
X
Xextern int lastsc, current_max_scs, *scset, *scbol, *scxclu, *sceof, *actvsc;
Xextern char **scname;
X
X
X/* variables for dfa machine data:
X * current_max_dfa_size - current maximum number of NFA states in DFA
X * current_max_xpairs - current maximum number of non-template xtion pairs
X * current_max_template_xpairs - current maximum number of template pairs
X * current_max_dfas - current maximum number DFA states
X * lastdfa - last dfa state number created
X * nxt - state to enter upon reading character
X * chk - check value to see if "nxt" applies
X * tnxt - internal nxt table for templates
X * base - offset into "nxt" for given state
X * def - where to go if "chk" disallows "nxt" entry
X * tblend - last "nxt/chk" table entry being used
X * firstfree - first empty entry in "nxt/chk" table
X * dss - nfa state set for each dfa
X * dfasiz - size of nfa state set for each dfa
X * dfaacc - accepting set for each dfa state (or accepting number, if
X *    -r is not given)
X * accsiz - size of accepting set for each dfa state
X * dhash - dfa state hash value
X * numas - number of DFA accepting states created; note that this
X *    is not necessarily the same value as num_rules, which is the analogous
X *    value for the NFA
X * numsnpairs - number of state/nextstate transition pairs
X * jambase - position in base/def where the default jam table starts
X * jamstate - state number corresponding to "jam" state
X * end_of_buffer_state - end-of-buffer dfa state number
X */
X
Xextern int current_max_dfa_size, current_max_xpairs;
Xextern int current_max_template_xpairs, current_max_dfas;
Xextern int lastdfa, lasttemp, *nxt, *chk, *tnxt;
Xextern int *base, *def, tblend, firstfree, **dss, *dfasiz;
Xextern union dfaacc_union
X    {
X    int *dfaacc_set;
X    int dfaacc_state;
X    } *dfaacc;
Xextern int *accsiz, *dhash, numas;
Xextern int numsnpairs, jambase, jamstate;
Xextern int end_of_buffer_state;
X
X/* variables for ccl information:
X * lastccl - ccl index of the last created ccl
X * current_maxccls - current limit on the maximum number of unique ccl's
X * cclmap - maps a ccl index to its set pointer
X * ccllen - gives the length of a ccl
X * cclng - true for a given ccl if the ccl is negated
X * cclreuse - counts how many times a ccl is re-used
X * current_max_ccl_tbl_size - current limit on number of characters needed
X *	to represent the unique ccl's
X * ccltbl - holds the characters in each ccl - indexed by cclmap
X */
X
Xextern int lastccl, current_maxccls, *cclmap, *ccllen, *cclng, cclreuse;
Xextern int current_max_ccl_tbl_size;
Xextern char *ccltbl;
X
X
X/* variables for miscellaneous information:
X * starttime - real-time when we started
X * endtime - real-time when we ended
X * nmstr - last NAME scanned by the scanner
X * sectnum - section number currently being parsed
X * nummt - number of empty nxt/chk table entries
X * hshcol - number of hash collisions detected by snstods
X * dfaeql - number of times a newly created dfa was equal to an old one
X * numeps - number of epsilon NFA states created
X * eps2 - number of epsilon states which have 2 out-transitions
X * num_reallocs - number of times it was necessary to realloc() a group
X *		  of arrays
X * tmpuses - number of DFA states that chain to templates
X * totnst - total number of NFA states used to make DFA states
X * peakpairs - peak number of transition pairs we had to store internally
X * numuniq - number of unique transitions
X * numdup - number of duplicate transitions
X * hshsave - number of hash collisions saved by checking number of states
X * num_backtracking - number of DFA states requiring back-tracking
X * bol_needed - whether scanner needs beginning-of-line recognition
X */
X
Xextern char *starttime, *endtime, nmstr[MAXLINE];
Xextern int sectnum, nummt, hshcol, dfaeql, numeps, eps2, num_reallocs;
Xextern int tmpuses, totnst, peakpairs, numuniq, numdup, hshsave;
Xextern int num_backtracking, bol_needed;
X
Xchar *allocate_array(), *reallocate_array();
X
X#define allocate_integer_array(size) \
X	(int *) allocate_array( size, sizeof( int ) )
X
X#define reallocate_integer_array(array,size) \
X	(int *) reallocate_array( (char *) array, size, sizeof( int ) )
X
X#define allocate_int_ptr_array(size) \
X	(int **) allocate_array( size, sizeof( int * ) )
X
X#define allocate_char_ptr_array(size) \
X	(char **) allocate_array( size, sizeof( char * ) )
X
X#define allocate_dfaacc_union(size) \
X	(union dfaacc_union *) \
X		allocate_array( size, sizeof( union dfaacc_union ) )
X
X#define reallocate_int_ptr_array(array,size) \
X	(int **) reallocate_array( (char *) array, size, sizeof( int * ) )
X
X#define reallocate_char_ptr_array(array,size) \
X	(char **) reallocate_array( (char *) array, size, sizeof( char * ) )
X
X#define reallocate_dfaacc_union(array, size) \
X	(union dfaacc_union *)  reallocate_array( (char *) array, size, sizeof( union dfaacc_union ) )
X
X#define allocate_character_array(size) allocate_array( size, sizeof( char ) )
X
X#define reallocate_character_array(array,size) \
X	reallocate_array( array, size, sizeof( char ) )
X
X
X/* used to communicate between scanner and parser.  The type should really
X * be YYSTYPE, but we can't easily get our hands on it.
X */
Xextern int yylval;
END_OF_FILE
if test 21874 -ne `wc -c <'flex/flexdef.h'`; then
    echo shar: \"'flex/flexdef.h'\" unpacked with wrong size!
fi
# end of 'flex/flexdef.h'
fi
echo shar: End of archive 4 \(of 7\).
cp /dev/null ark4isdone
MISSING=""
for I in 1 2 3 4 5 6 7 ; do
    if test ! -f ark${I}isdone ; then
	MISSING="${MISSING} ${I}"
    fi
done
if test "${MISSING}" = "" ; then
    echo You have unpacked all 7 archives.
    rm -f ark[1-9]isdone
else
    echo You still need to unpack the following archives:
    echo "        " ${MISSING}
fi
##  End of shell archive.
exit 0
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.
Use a domain-based address or give alternate paths, or you may lose out.