[comp.sources.amiga] v90i238: flex 2.3 - fast lexical analyzer generator, Part11/13

amiga-request@abcfd20.larc.nasa.gov (Amiga Sources/Binaries Moderator) (08/20/90)
Submitted-by: loftus@wpllabs.uucp (William P Loftus)
Posting-number: Volume 90, Issue 238
Archive-name: unix/flex-2.3/part11

#!/bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of archive 11 (of 13)."
# Contents:  flexdoc.1
# Wrapped by tadguy@abcfd20 on Sun Aug 19 18:41:49 1990
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'flexdoc.1' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'flexdoc.1'\"
else
echo shar: Extracting \"'flexdoc.1'\" \(65353 characters\)
sed "s/^X//" >'flexdoc.1' <<'END_OF_FILE'
X.TH FLEX 1 "26 May 1990" "Version 2.3"
X.SH NAME
Xflex - fast lexical analyzer generator
X.SH SYNOPSIS
X.B flex
X.B [-bcdfinpstvFILT8 -C[efmF] -Sskeleton]
X.I [filename ...]
X.SH DESCRIPTION
X.I flex
Xis a tool for generating
X.I scanners:
Xprograms which recognized lexical patterns in text.
X.I flex
Xreads
Xthe given input files, or its standard input if no file names are given,
Xfor a description of a scanner to generate.  The description is in
Xthe form of pairs
Xof regular expressions and C code, called
X.I rules.  flex
Xgenerates as output a C source file,
X.B lex.yy.c,
Xwhich defines a routine
X.B yylex().
XThis file is compiled and linked with the
X.B -lfl
Xlibrary to produce an executable.  When the executable is run,
Xit analyzes its input for occurrences
Xof the regular expressions.  Whenever it finds one, it executes
Xthe corresponding C code.
X.SH SOME SIMPLE EXAMPLES
X.LP
XFirst some simple examples to get the flavor of how one uses
X.I flex.
XThe following
X.I flex
Xinput specifies a scanner which whenever it encounters the string
X"username" will replace it with the user's login name:
X.nf
X
X    %%
X    username    printf( "%s", getlogin() );
X
X.fi
XBy default, any text not matched by a
X.I flex
Xscanner
Xis copied to the output, so the net effect of this scanner is
Xto copy its input file to its output with each occurrence
Xof "username" expanded.
XIn this input, there is just one rule.  "username" is the
X.I pattern
Xand the "printf" is the
X.I action.
XThe "%%" marks the beginning of the rules.
X.LP
XHere's another simple example:
X.nf
X
X        int num_lines = 0, num_chars = 0;
X
X    %%
X    \\n    ++num_lines; ++num_chars;
X    .     ++num_chars;
X
X    %%
X    main()
X        {
X        yylex();
X        printf( "# of lines = %d, # of chars = %d\\n",
X                num_lines, num_chars );
X        }
X
X.fi
XThis scanner counts the number of characters and the number
Xof lines in its input (it produces no output other than the
Xfinal report on the counts).  The first line
Xdeclares two globals, "num_lines" and "num_chars", which are accessible
Xboth inside
X.B yylex()
Xand in the
X.B main()
Xroutine declared after the second "%%".  There are two rules, one
Xwhich matches a newline ("\\n") and increments both the line count and
Xthe character count, and one which matches any character other than
Xa newline (indicated by the "." regular expression).
X.LP
XA somewhat more complicated example:
X.nf
X
X    /* scanner for a toy Pascal-like language */
X
X    %{
X    /* need this for the call to atof() below */
X    #include <math.h>
X    %}
X
X    DIGIT    [0-9]
X    ID       [a-z][a-z0-9]*
X
X    %%
X
X    {DIGIT}+    {
X                printf( "An integer: %s (%d)\\n", yytext,
X                        atoi( yytext ) );
X                }
X
X    {DIGIT}+"."{DIGIT}*        {
X                printf( "A float: %s (%g)\\n", yytext,
X                        atof( yytext ) );
X                }
X
X    if|then|begin|end|procedure|function        {
X                printf( "A keyword: %s\\n", yytext );
X                }
X
X    {ID}        printf( "An identifier: %s\\n", yytext );
X
X    "+"|"-"|"*"|"/"   printf( "An operator: %s\\n", yytext );
X
X    "{"[^}\\n]*"}"     /* eat up one-line comments */
X
X    [ \\t\\n]+          /* eat up whitespace */
X
X    .           printf( "Unrecognized character: %s\\n", yytext );
X
X    %%
X
X    main( argc, argv )
X    int argc;
X    char **argv;
X        {
X        ++argv, --argc;  /* skip over program name */
X        if ( argc > 0 )
X                yyin = fopen( argv[0], "r" );
X        else
X                yyin = stdin;
X        
X        yylex();
X        }
X
X.fi
XThis is the beginnings of a simple scanner for a language like
XPascal.  It identifies different types of
X.I tokens
Xand reports on what it has seen.
X.LP
XThe details of this example will be explained in the following
Xsections.
X.SH FORMAT OF THE INPUT FILE
XThe
X.I flex
Xinput file consists of three sections, separated by a line with just
X.B %%
Xin it:
X.nf
X
X    definitions
X    %%
X    rules
X    %%
X    user code
X
X.fi
XThe
X.I definitions
Xsection contains declarations of simple
X.I name
Xdefinitions to simplify the scanner specification, and declarations of
X.I start conditions,
Xwhich are explained in a later section.
X.LP
XName definitions have the form:
X.nf
X
X    name definition
X
X.fi
XThe "name" is a word beginning with a letter or an underscore ('_')
Xfollowed by zero or more letters, digits, '_', or '-' (dash).
XThe definition is taken to begin at the first non-white-space character
Xfollowing the name and continuing to the end of the line.
XThe definition can subsequently be referred to using "{name}", which
Xwill expand to "(definition)".  For example,
X.nf
X
X    DIGIT    [0-9]
X    ID       [a-z][a-z0-9]*
X
X.fi
Xdefines "DIGIT" to be a regular expression which matches a
Xsingle digit, and
X"ID" to be a regular expression which matches a letter
Xfollowed by zero-or-more letters-or-digits.
XA subsequent reference to
X.nf
X
X    {DIGIT}+"."{DIGIT}*
X
X.fi
Xis identical to
X.nf
X
X    ([0-9])+"."([0-9])*
X
X.fi
Xand matches one-or-more digits followed by a '.' followed
Xby zero-or-more digits.
X.LP
XThe
X.I rules
Xsection of the
X.I flex
Xinput contains a series of rules of the form:
X.nf
X
X    pattern   action
X
X.fi
Xwhere the pattern must be unindented and the action must begin
Xon the same line.
X.LP
XSee below for a further description of patterns and actions.
X.LP
XFinally, the user code section is simply copied to
X.B lex.yy.c
Xverbatim.
XIt is used for companion routines which call or are called
Xby the scanner.  The presence of this section is optional;
Xif it is missing, the second
X.B %%
Xin the input file may be skipped, too.
X.LP
XIn the definitions and rules sections, any
X.I indented
Xtext or text enclosed in
X.B %{
Xand
X.B %}
Xis copied verbatim to the output (with the %{}'s removed).
XThe %{}'s must appear unindented on lines by themselves.
X.LP
XIn the rules section,
Xany indented or %{} text appearing before the
Xfirst rule may be used to declare variables
Xwhich are local to the scanning routine and (after the declarations)
Xcode which is to be executed whenever the scanning routine is entered.
XOther indented or %{} text in the rule section is still copied to the output,
Xbut its meaning is not well-defined and it may well cause compile-time
Xerrors (this feature is present for
X.I POSIX
Xcompliance; see below for other such features).
X.LP
XIn the definitions section, an unindented comment (i.e., a line
Xbeginning with "/*") is also copied verbatim to the output up
Xto the next "*/".  Also, any line in the definitions section
Xbeginning with '#' is ignored, though this style of comment is
Xdeprecated and may go away in the future.
X.SH PATTERNS
XThe patterns in the input are written using an extended set of regular
Xexpressions.  These are:
X.nf
X
X    x          match the character 'x'
X    .          any character except newline
X    [xyz]      a "character class"; in this case, the pattern
X                 matches either an 'x', a 'y', or a 'z'
X    [abj-oZ]   a "character class" with a range in it; matches
X                 an 'a', a 'b', any letter from 'j' through 'o',
X                 or a 'Z'
X    [^A-Z]     a "negated character class", i.e., any character
X                 but those in the class.  In this case, any
X                 character EXCEPT an uppercase letter.
X    [^A-Z\\n]   any character EXCEPT an uppercase letter or
X                 a newline
X    r*         zero or more r's, where r is any regular expression
X    r+         one or more r's
X    r?         zero or one r's (that is, "an optional r")
X    r{2,5}     anywhere from two to five r's
X    r{2,}      two or more r's
X    r{4}       exactly 4 r's
X    {name}     the expansion of the "name" definition
X               (see above)
X    "[xyz]\\"foo"
X               the literal string: [xyz]"foo
X    \\X         if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
X                 then the ANSI-C interpretation of \\x.
X                 Otherwise, a literal 'X' (used to escape
X                 operators such as '*')
X    \\123       the character with octal value 123
X    \\x2a       the character with hexadecimal value 2a
X    (r)        match an r; parentheses are used to override
X                 precedence (see below)
X
X
X    rs         the regular expression r followed by the
X                 regular expression s; called "concatenation"
X
X
X    r|s        either an r or an s
X
X
X    r/s        an r but only if it is followed by an s.  The
X                 s is not part of the matched text.  This type
X                 of pattern is called as "trailing context".
X    ^r         an r, but only at the beginning of a line
X    r$         an r, but only at the end of a line.  Equivalent
X                 to "r/\\n".
X
X
X    <s>r       an r, but only in start condition s (see
X               below for discussion of start conditions)
X    <s1,s2,s3>r
X               same, but in any of start conditions s1,
X               s2, or s3
X
X
X    <<EOF>>    an end-of-file
X    <s1,s2><<EOF>>
X               an end-of-file when in start condition s1 or s2
X
X.fi
XThe regular expressions listed above are grouped according to
Xprecedence, from highest precedence at the top to lowest at the bottom.
XThose grouped together have equal precedence.  For example,
X.nf
X
X    foo|bar*
X
X.fi
Xis the same as
X.nf
X
X    (foo)|(ba(r*))
X
X.fi
Xsince the '*' operator has higher precedence than concatenation,
Xand concatenation higher than alternation ('|').  This pattern
Xtherefore matches
X.I either
Xthe string "foo"
X.I or
Xthe string "ba" followed by zero-or-more r's.
XTo match "foo" or zero-or-more "bar"'s, use:
X.nf
X
X    foo|(bar)*
X
X.fi
Xand to match zero-or-more "foo"'s-or-"bar"'s:
X.nf
X
X    (foo|bar)*
X
X.fi
X.LP
XSome notes on patterns:
X.IP -
XA negated character class such as the example "[^A-Z]"
Xabove
X.I will match a newline
Xunless "\\n" (or an equivalent escape sequence) is one of the
Xcharacters explicitly present in the negated character class
X(e.g., "[^A-Z\\n]").  This is unlike how many other regular
Xexpression tools treat negated character classes, but unfortunately
Xthe inconsistency is historically entrenched.
XMatching newlines means that a pattern like [^"]* can match an entire
Xinput (overflowing the scanner's input buffer) unless there's another
Xquote in the input.
X.IP -
XA rule can have at most one instance of trailing context (the '/' operator
Xor the '$' operator).  The start condition, '^', and "<<EOF>>" patterns
Xcan only occur at the beginning of a pattern, and, as well as with '/' and '$',
Xcannot be grouped inside parentheses.  A '^' which does not occur at
Xthe beginning of a rule or a '$' which does not occur at the end of
Xa rule loses its special properties and is treated as a normal character.
X.IP
XThe following are illegal:
X.nf
X
X    foo/bar$
X    <sc1>foo<sc2>bar
X
X.fi
XNote that the first of these, can be written "foo/bar\\n".
X.IP
XThe following will result in '$' or '^' being treated as a normal character:
X.nf
X
X    foo|(bar$)
X    foo|^bar
X
X.fi
XIf what's wanted is a "foo" or a bar-followed-by-a-newline, the following
Xcould be used (the special '|' action is explained below):
X.nf
X
X    foo      |
X    bar$     /* action goes here */
X
X.fi
XA similar trick will work for matching a foo or a
Xbar-at-the-beginning-of-a-line.
X.SH HOW THE INPUT IS MATCHED
XWhen the generated scanner is run, it analyzes its input looking
Xfor strings which match any of its patterns.  If it finds more than
Xone match, it takes the one matching the most text (for trailing
Xcontext rules, this includes the length of the trailing part, even
Xthough it will then be returned to the input).  If it finds two
Xor more matches of the same length, the
Xrule listed first in the
X.I flex
Xinput file is chosen.
X.LP
XOnce the match is determined, the text corresponding to the match
X(called the
X.I token)
Xis made available in the global character pointer
X.B yytext,
Xand its length in the global integer
X.B yyleng.
XThe
X.I action
Xcorresponding to the matched pattern is then executed (a more
Xdetailed description of actions follows), and then the remaining
Xinput is scanned for another match.
X.LP
XIf no match is found, then the
X.I default rule
Xis executed: the next character in the input is considered matched and
Xcopied to the standard output.  Thus, the simplest legal
X.I flex
Xinput is:
X.nf
X
X    %%
X
X.fi
Xwhich generates a scanner that simply copies its input (one character
Xat a time) to its output.
X.SH ACTIONS
XEach pattern in a rule has a corresponding action, which can be any
Xarbitrary C statement.  The pattern ends at the first non-escaped
Xwhitespace character; the remainder of the line is its action.  If the
Xaction is empty, then when the pattern is matched the input token
Xis simply discarded.  For example, here is the specification for a program
Xwhich deletes all occurrences of "zap me" from its input:
X.nf
X
X    %%
X    "zap me"
X
X.fi
X(It will copy all other characters in the input to the output since
Xthey will be matched by the default rule.)
X.LP
XHere is a program which compresses multiple blanks and tabs down to
Xa single blank, and throws away whitespace found at the end of a line:
X.nf
X
X    %%
X    [ \\t]+        putchar( ' ' );
X    [ \\t]+$       /* ignore this token */
X
X.fi
X.LP
XIf the action contains a '{', then the action spans till the balancing '}'
Xis found, and the action may cross multiple lines.
X.I flex 
Xknows about C strings and comments and won't be fooled by braces found
Xwithin them, but also allows actions to begin with
X.B %{
Xand will consider the action to be all the text up to the next
X.B %}
X(regardless of ordinary braces inside the action).
X.LP
XAn action consisting solely of a vertical bar ('|') means "same as
Xthe action for the next rule."  See below for an illustration.
X.LP
XActions can include arbitrary C code, including
X.B return
Xstatements to return a value to whatever routine called
X.B yylex().
XEach time
X.B yylex()
Xis called it continues processing tokens from where it last left
Xoff until it either reaches
Xthe end of the file or executes a return.  Once it reaches an end-of-file,
Xhowever, then any subsequent call to
X.B yylex()
Xwill simply immediately return, unless
X.B yyrestart()
Xis first called (see below).
X.LP
XActions are not allowed to modify yytext or yyleng.
X.LP
XThere are a number of special directives which can be included within
Xan action:
X.IP -
X.B ECHO
Xcopies yytext to the scanner's output.
X.IP -
X.B BEGIN
Xfollowed by the name of a start condition places the scanner in the
Xcorresponding start condition (see below).
X.IP -
X.B REJECT
Xdirects the scanner to proceed on to the "second best" rule which matched the
Xinput (or a prefix of the input).  The rule is chosen as described
Xabove in "How the Input is Matched", and
X.B yytext
Xand
X.B yyleng
Xset up appropriately.
XIt may either be one which matched as much text
Xas the originally chosen rule but came later in the
X.I flex
Xinput file, or one which matched less text.
XFor example, the following will both count the
Xwords in the input and call the routine special() whenever "frob" is seen:
X.nf
X
X            int word_count = 0;
X    %%
X
X    frob        special(); REJECT;
X    [^ \\t\\n]+   ++word_count;
X
X.fi
XWithout the
X.B REJECT,
Xany "frob"'s in the input would not be counted as words, since the
Xscanner normally executes only one action per token.
XMultiple
X.B REJECT's
Xare allowed, each one finding the next best choice to the currently
Xactive rule.  For example, when the following scanner scans the token
X"abcd", it will write "abcdabcaba" to the output:
X.nf
X
X    %%
X    a        |
X    ab       |
X    abc      |
X    abcd     ECHO; REJECT;
X    .|\\n     /* eat up any unmatched character */
X
X.fi
X(The first three rules share the fourth's action since they use
Xthe special '|' action.)
X.B REJECT
Xis a particularly expensive feature in terms scanner performance;
Xif it is used in
X.I any
Xof the scanner's actions it will slow down
X.I all
Xof the scanner's matching.  Furthermore,
X.B REJECT
Xcannot be used with the
X.I -f
Xor
X.I -F
Xoptions (see below).
X.IP
XNote also that unlike the other special actions,
X.B REJECT
Xis a
X.I branch;
Xcode immediately following it in the action will
X.I not
Xbe executed.
X.IP -
X.B yymore()
Xtells the scanner that the next time it matches a rule, the corresponding
Xtoken should be
X.I appended
Xonto the current value of
X.B yytext
Xrather than replacing it.  For example, given the input "mega-kludge"
Xthe following will write "mega-mega-kludge" to the output:
X.nf
X
X    %%
X    mega-    ECHO; yymore();
X    kludge   ECHO;
X
X.fi
XFirst "mega-" is matched and echoed to the output.  Then "kludge"
Xis matched, but the previous "mega-" is still hanging around at the
Xbeginning of
X.B yytext
Xso the
X.B ECHO
Xfor the "kludge" rule will actually write "mega-kludge".
XThe presence of
X.B yymore()
Xin the scanner's action entails a minor performance penalty in the
Xscanner's matching speed.
X.IP -
X.B yyless(n)
Xreturns all but the first
X.I n
Xcharacters of the current token back to the input stream, where they
Xwill be rescanned when the scanner looks for the next match.
X.B yytext
Xand
X.B yyleng
Xare adjusted appropriately (e.g.,
X.B yyleng
Xwill now be equal to
X.I n
X).  For example, on the input "foobar" the following will write out
X"foobarbar":
X.nf
X
X    %%
X    foobar    ECHO; yyless(3);
X    [a-z]+    ECHO;
X
X.fi
XAn argument of 0 to
X.B yyless
Xwill cause the entire current input string to be scanned again.  Unless you've
Xchanged how the scanner will subsequently process its input (using
X.B BEGIN,
Xfor example), this will result in an endless loop.
X.IP -
X.B unput(c)
Xputs the character
X.I c
Xback onto the input stream.  It will be the next character scanned.
XThe following action will take the current token and cause it
Xto be rescanned enclosed in parentheses.
X.nf
X
X    {
X    int i;
X    unput( ')' );
X    for ( i = yyleng - 1; i >= 0; --i )
X        unput( yytext[i] );
X    unput( '(' );
X    }
X
X.fi
XNote that since each
X.B unput()
Xputs the given character back at the
X.I beginning
Xof the input stream, pushing back strings must be done back-to-front.
X.IP -
X.B input()
Xreads the next character from the input stream.  For example,
Xthe following is one way to eat up C comments:
X.nf
X
X    %%
X    "/*"        {
X                register int c;
X
X                for ( ; ; )
X                    {
X                    while ( (c = input()) != '*' &&
X                            c != EOF )
X                        ;    /* eat up text of comment */
X
X                    if ( c == '*' )
X                        {
X                        while ( (c = input()) == '*' )
X                            ;
X                        if ( c == '/' )
X                            break;    /* found the end */
X                        }
X
X                    if ( c == EOF )
X                        {
X                        error( "EOF in comment" );
X                        break;
X                        }
X                    }
X                }
X
X.fi
X(Note that if the scanner is compiled using
X.B C++,
Xthen
X.B input()
Xis instead referred to as
X.B yyinput(),
Xin order to avoid a name clash with the
X.B C++
Xstream by the name of
X.I input.)
X.IP -
X.B yyterminate()
Xcan be used in lieu of a return statement in an action.  It terminates
Xthe scanner and returns a 0 to the scanner's caller, indicating "all done".
XSubsequent calls to the scanner will immediately return unless preceded
Xby a call to
X.B yyrestart()
X(see below).
XBy default,
X.B yyterminate()
Xis also called when an end-of-file is encountered.  It is a macro and
Xmay be redefined.
X.SH THE GENERATED SCANNER
XThe output of
X.I flex
Xis the file
X.B lex.yy.c,
Xwhich contains the scanning routine
X.B yylex(),
Xa number of tables used by it for matching tokens, and a number
Xof auxiliary routines and macros.  By default,
X.B yylex()
Xis declared as follows:
X.nf
X
X    int yylex()
X        {
X        ... various definitions and the actions in here ...
X        }
X
X.fi
X(If your environment supports function prototypes, then it will
Xbe "int yylex( void )".)  This definition may be changed by redefining
Xthe "YY_DECL" macro.  For example, you could use:
X.nf
X
X    #undef YY_DECL
X    #define YY_DECL float lexscan( a, b ) float a, b;
X
X.fi
Xto give the scanning routine the name
X.I lexscan,
Xreturning a float, and taking two floats as arguments.  Note that
Xif you give arguments to the scanning routine using a
XK&R-style/non-prototyped function declaration, you must terminate
Xthe definition with a semi-colon (;).
X.LP
XWhenever
X.B yylex()
Xis called, it scans tokens from the global input file
X.I yyin
X(which defaults to stdin).  It continues until it either reaches
Xan end-of-file (at which point it returns the value 0) or
Xone of its actions executes a
X.I return
Xstatement.
XIn the former case, when called again the scanner will immediately
Xreturn unless
X.B yyrestart()
Xis called to point
X.I yyin
Xat the new input file.  (
X.B yyrestart()
Xtakes one argument, a
X.B FILE *
Xpointer.)
XIn the latter case (i.e., when an action
Xexecutes a return), the scanner may then be called again and it
Xwill resume scanning where it left off.
X.LP
XBy default (and for purposes of efficiency), the scanner uses
Xblock-reads rather than simple
X.I getc()
Xcalls to read characters from
X.I yyin.
XThe nature of how it gets its input can be controlled by redefining the
X.B YY_INPUT
Xmacro.
XYY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)".  Its
Xaction is to place up to
X.I max_size
Xcharacters in the character array
X.I buf
Xand return in the integer variable
X.I result
Xeither the
Xnumber of characters read or the constant YY_NULL (0 on Unix systems)
Xto indicate EOF.  The default YY_INPUT reads from the
Xglobal file-pointer "yyin".
X.LP
XA sample redefinition of YY_INPUT (in the definitions
Xsection of the input file):
X.nf
X
X    %{
X    #undef YY_INPUT
X    #define YY_INPUT(buf,result,max_size) \\
X        { \\
X        int c = getchar(); \\
X        result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \\
X        }
X    %}
X
X.fi
XThis definition will change the input processing to occur
Xone character at a time.
X.LP
XYou also can add in things like keeping track of the
Xinput line number this way; but don't expect your scanner to
Xgo very fast.
X.LP
XWhen the scanner receives an end-of-file indication from YY_INPUT,
Xit then checks the
X.B yywrap()
Xfunction.  If
X.B yywrap()
Xreturns false (zero), then it is assumed that the
Xfunction has gone ahead and set up
X.I yyin
Xto point to another input file, and scanning continues.  If it returns
Xtrue (non-zero), then the scanner terminates, returning 0 to its
Xcaller.
X.LP
XThe default
X.B yywrap()
Xalways returns 1.  Presently, to redefine it you must first
X"#undef yywrap", as it is currently implemented as a macro.  As indicated
Xby the hedging in the previous sentence, it may be changed to
Xa true function in the near future.
X.LP
XThe scanner writes its
X.B ECHO
Xoutput to the
X.I yyout
Xglobal (default, stdout), which may be redefined by the user simply
Xby assigning it to some other
X.B FILE
Xpointer.
X.SH START CONDITIONS
X.I flex
Xprovides a mechanism for conditionally activating rules.  Any rule
Xwhose pattern is prefixed with "<sc>" will only be active when
Xthe scanner is in the start condition named "sc".  For example,
X.nf
X
X    <STRING>[^"]*        { /* eat up the string body ... */
X                ...
X                }
X
X.fi
Xwill be active only when the scanner is in the "STRING" start
Xcondition, and
X.nf
X
X    <INITIAL,STRING,QUOTE>\\.        { /* handle an escape ... */
X                ...
X                }
X
X.fi
Xwill be active only when the current start condition is
Xeither "INITIAL", "STRING", or "QUOTE".
X.LP
XStart conditions
Xare declared in the definitions (first) section of the input
Xusing unindented lines beginning with either
X.B %s
Xor
X.B %x
Xfollowed by a list of names.
XThe former declares
X.I inclusive
Xstart conditions, the latter
X.I exclusive
Xstart conditions.  A start condition is activated using the
X.B BEGIN
Xaction.  Until the next
X.B BEGIN
Xaction is executed, rules with the given start
Xcondition will be active and
Xrules with other start conditions will be inactive.
XIf the start condition is
X.I inclusive,
Xthen rules with no start conditions at all will also be active.
XIf it is
X.I exclusive,
Xthen
X.I only
Xrules qualified with the start condition will be active.
XA set of rules contingent on the same exclusive start condition
Xdescribe a scanner which is independent of any of the other rules in the
X.I flex
Xinput.  Because of this,
Xexclusive start conditions make it easy to specify "mini-scanners"
Xwhich scan portions of the input that are syntactically different
Xfrom the rest (e.g., comments).
X.LP
XIf the distinction between inclusive and exclusive start conditions
Xis still a little vague, here's a simple example illustrating the
Xconnection between the two.  The set of rules:
X.nf
X
X    %s example
X    %%
X    <example>foo           /* do something */
X
X.fi
Xis equivalent to
X.nf
X
X    %x example
X    %%
X    <INITIAL,example>foo   /* do something */
X
X.fi
X.LP
XThe default rule (to
X.B ECHO
Xany unmatched character) remains active in start conditions.
X.LP
X.B BEGIN(0)
Xreturns to the original state where only the rules with
Xno start conditions are active.  This state can also be
Xreferred to as the start-condition "INITIAL", so
X.B BEGIN(INITIAL)
Xis equivalent to
X.B BEGIN(0).
X(The parentheses around the start condition name are not required but
Xare considered good style.)
X.LP
X.B BEGIN
Xactions can also be given as indented code at the beginning
Xof the rules section.  For example, the following will cause
Xthe scanner to enter the "SPECIAL" start condition whenever
X.I yylex()
Xis called and the global variable
X.I enter_special
Xis true:
X.nf
X
X            int enter_special;
X
X    %x SPECIAL
X    %%
X            if ( enter_special )
X                BEGIN(SPECIAL);
X
X    <SPECIAL>blahblahblah
X    ...more rules follow...
X
X.fi
X.LP
XTo illustrate the uses of start conditions,
Xhere is a scanner which provides two different interpretations
Xof a string like "123.456".  By default it will treat it as
Xas three tokens, the integer "123", a dot ('.'), and the integer "456".
XBut if the string is preceded earlier in the line by the string
X"expect-floats"
Xit will treat it as a single token, the floating-point number
X123.456:
X.nf
X
X    %{
X    #include <math.h>
X    %}
X    %s expect
X
X    %%
X    expect-floats        BEGIN(expect);
X
X    <expect>[0-9]+"."[0-9]+      {
X                printf( "found a float, = %f\\n",
X                        atof( yytext ) );
X                }
X    <expect>\\n           {
X                /* that's the end of the line, so
X                 * we need another "expect-number"
X                 * before we'll recognize any more
X                 * numbers
X                 */
X                BEGIN(INITIAL);
X                }
X
X    [0-9]+      {
X                printf( "found an integer, = %d\\n",
X                        atoi( yytext ) );
X                }
X
X    "."         printf( "found a dot\\n" );
X
X.fi
XHere is a scanner which recognizes (and discards) C comments while
Xmaintaining a count of the current input line.
X.nf
X
X    %x comment
X    %%
X            int line_num = 1;
X
X    "/*"         BEGIN(comment);
X
X    <comment>[^*\\n]*        /* eat anything that's not a '*' */
X    <comment>"*"+[^*/\\n]*   /* eat up '*'s not followed by '/'s */
X    <comment>\\n             ++line_num;
X    <comment>"*"+"/"        BEGIN(INITIAL);
X
X.fi
XNote that start-conditions names are really integer values and
Xcan be stored as such.  Thus, the above could be extended in the
Xfollowing fashion:
X.nf
X
X    %x comment foo
X    %%
X            int line_num = 1;
X            int comment_caller;
X
X    "/*"         {
X                 comment_caller = INITIAL;
X                 BEGIN(comment);
X                 }
X
X    ...
X
X    <foo>"/*"    {
X                 comment_caller = foo;
X                 BEGIN(comment);
X                 }
X
X    <comment>[^*\\n]*        /* eat anything that's not a '*' */
X    <comment>"*"+[^*/\\n]*   /* eat up '*'s not followed by '/'s */
X    <comment>\\n             ++line_num;
X    <comment>"*"+"/"        BEGIN(comment_caller);
X
X.fi
XOne can then implement a "stack" of start conditions using an
Xarray of integers.  (It is likely that such stacks will become
Xa full-fledged
X.I flex
Xfeature in the future.)  Note, though, that
Xstart conditions do not have their own name-space; %s's and %x's
Xdeclare names in the same fashion as #define's.
X.SH MULTIPLE INPUT BUFFERS
XSome scanners (such as those which support "include" files)
Xrequire reading from several input streams.  As
X.I flex
Xscanners do a large amount of buffering, one cannot control
Xwhere the next input will be read from by simply writing a
X.B YY_INPUT
Xwhich is sensitive to the scanning context.
X.B YY_INPUT
Xis only called when the scanner reaches the end of its buffer, which
Xmay be a long time after scanning a statement such as an "include"
Xwhich requires switching the input source.
X.LP
XTo negotiate these sorts of problems,
X.I flex
Xprovides a mechanism for creating and switching between multiple
Xinput buffers.  An input buffer is created by using:
X.nf
X
X    YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
X
X.fi
Xwhich takes a
X.I FILE
Xpointer and a size and creates a buffer associated with the given
Xfile and large enough to hold
X.I size
Xcharacters (when in doubt, use
X.B YY_BUF_SIZE
Xfor the size).  It returns a
X.B YY_BUFFER_STATE
Xhandle, which may then be passed to other routines:
X.nf
X
X    void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
X
X.fi
Xswitches the scanner's input buffer so subsequent tokens will
Xcome from
X.I new_buffer.
XNote that
X.B yy_switch_to_buffer()
Xmay be used by yywrap() to sets things up for continued scanning, instead
Xof opening a new file and pointing
X.I yyin
Xat it.
X.nf
X
X    void yy_delete_buffer( YY_BUFFER_STATE buffer )
X
X.fi
Xis used to reclaim the storage associated with a buffer.
X.LP
X.B yy_new_buffer()
Xis an alias for
X.B yy_create_buffer(),
Xprovided for compatibility with the C++ use of
X.I new
Xand
X.I delete
Xfor creating and destroying dynamic objects.
X.LP
XFinally, the
X.B YY_CURRENT_BUFFER
Xmacro returns a
X.B YY_BUFFER_STATE
Xhandle to the current buffer.
X.LP
XHere is an example of using these features for writing a scanner
Xwhich expands include files (the
X.B <<EOF>>
Xfeature is discussed below):
X.nf
X
X    /* the "incl" state is used for picking up the name
X     * of an include file
X     */
X    %x incl
X
X    %{
X    #define MAX_INCLUDE_DEPTH 10
X    YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
X    int include_stack_ptr = 0;
X    %}
X
X    %%
X    include             BEGIN(incl);
X
X    [a-z]+              ECHO;
X    [^a-z\\n]*\\n?        ECHO;
X
X    <incl>[ \\t]*      /* eat the whitespace */
X    <incl>[^ \\t\\n]+   { /* got the include file name */
X            if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
X                {
X                fprintf( stderr, "Includes nested too deeply" );
X                exit( 1 );
X                }
X
X            include_stack[include_stack_ptr++] =
X                YY_CURRENT_BUFFER;
X
X            yyin = fopen( yytext, "r" );
X
X            if ( ! yyin )
X                error( ... );
X
X            yy_switch_to_buffer(
X                yy_create_buffer( yyin, YY_BUF_SIZE ) );
X
X            BEGIN(INITIAL);
X            }
X
X    <<EOF>> {
X            if ( --include_stack_ptr < 0 )
X                {
X                yyterminate();
X                }
X
X            else
X                yy_switch_to_buffer(
X                     include_stack[include_stack_ptr] );
X            }
X
X.fi
X.SH END-OF-FILE RULES
XThe special rule "<<EOF>>" indicates
Xactions which are to be taken when an end-of-file is
Xencountered and yywrap() returns non-zero (i.e., indicates
Xno further files to process).  The action must finish
Xby doing one of four things:
X.IP -
Xthe special
X.B YY_NEW_FILE
Xaction, if
X.I yyin
Xhas been pointed at a new file to process;
X.IP -
Xa
X.I return
Xstatement;
X.IP -
Xthe special
X.B yyterminate()
Xaction;
X.IP -
Xor, switching to a new buffer using
X.B yy_switch_to_buffer()
Xas shown in the example above.
X.LP
X<<EOF>> rules may not be used with other
Xpatterns; they may only be qualified with a list of start
Xconditions.  If an unqualified <<EOF>> rule is given, it
Xapplies to
X.I all
Xstart conditions which do not already have <<EOF>> actions.  To
Xspecify an <<EOF>> rule for only the initial start condition, use
X.nf
X
X    <INITIAL><<EOF>>
X
X.fi
X.LP
XThese rules are useful for catching things like unclosed comments.
XAn example:
X.nf
X
X    %x quote
X    %%
X
X    ...other rules for dealing with quotes...
X
X    <quote><<EOF>>   {
X             error( "unterminated quote" );
X             yyterminate();
X             }
X    <<EOF>>  {
X             if ( *++filelist )
X                 {
X                 yyin = fopen( *filelist, "r" );
X                 YY_NEW_FILE;
X                 }
X             else
X                yyterminate();
X             }
X
X.fi
X.SH MISCELLANEOUS MACROS
XThe macro
X.bd
XYY_USER_ACTION
Xcan be redefined to provide an action
Xwhich is always executed prior to the matched rule's action.  For example,
Xit could be #define'd to call a routine to convert yytext to lower-case.
X.LP
XThe macro
X.B YY_USER_INIT
Xmay be redefined to provide an action which is always executed before
Xthe first scan (and before the scanner's internal initializations are done).
XFor example, it could be used to call a routine to read
Xin a data table or open a logging file.
X.LP
XIn the generated scanner, the actions are all gathered in one large
Xswitch statement and separated using
X.B YY_BREAK,
Xwhich may be redefined.  By default, it is simply a "break", to separate
Xeach rule's action from the following rule's.
XRedefining
X.B YY_BREAK
Xallows, for example, C++ users to
X#define YY_BREAK to do nothing (while being very careful that every
Xrule ends with a "break" or a "return"!) to avoid suffering from
Xunreachable statement warnings where because a rule's action ends with
X"return", the
X.B YY_BREAK
Xis inaccessible.
X.SH INTERFACING WITH YACC
XOne of the main uses of
X.I flex
Xis as a companion to the
X.I yacc
Xparser-generator.
X.I yacc
Xparsers expect to call a routine named
X.B yylex()
Xto find the next input token.  The routine is supposed to
Xreturn the type of the next token as well as putting any associated
Xvalue in the global
X.B yylval.
XTo use
X.I flex
Xwith
X.I yacc,
Xone specifies the
X.B -d
Xoption to
X.I yacc
Xto instruct it to generate the file
X.B y.tab.h
Xcontaining definitions of all the
X.B %tokens
Xappearing in the
X.I yacc
Xinput.  This file is then included in the
X.I flex
Xscanner.  For example, if one of the tokens is "TOK_NUMBER",
Xpart of the scanner might look like:
X.nf
X
X    %{
X    #include "y.tab.h"
X    %}
X
X    %%
X
X    [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;
X
X.fi
X.SH TRANSLATION TABLE
XIn the name of POSIX compliance,
X.I flex
Xsupports a
X.I translation table
Xfor mapping input characters into groups.
XThe table is specified in the first section, and its format looks like:
X.nf
X
X    %t
X    1        abcd
X    2        ABCDEFGHIJKLMNOPQRSTUVWXYZ
X    52       0123456789
X    6        \\t\\ \\n
X    %t
X
X.fi
XThis example specifies that the characters 'a', 'b', 'c', and 'd'
Xare to all be lumped into group #1, upper-case letters
Xin group #2, digits in group #52, tabs, blanks, and newlines into
Xgroup #6, and
X.I
Xno other characters will appear in the patterns.
XThe group numbers are actually disregarded by
X.I flex;
X.B %t
Xserves, though, to lump characters together.  Given the above
Xtable, for example, the pattern "a(AA)*5" is equivalent to "d(ZQ)*0".
XThey both say, "match any character in group #1, followed by
Xzero-or-more pairs of characters
Xfrom group #2, followed by a character from group #52."  Thus
X.B %t
Xprovides a crude way for introducing equivalence classes into
Xthe scanner specification.
X.LP
XNote that the
X.B -i
Xoption (see below) coupled with the equivalence classes which
X.I flex
Xautomatically generates take care of virtually all the instances
Xwhen one might consider using
X.B %t.
XBut what the hell, it's there if you want it.
X.SH OPTIONS
X.I flex
Xhas the following options:
X.TP
X.B -b
XGenerate backtracking information to
X.I lex.backtrack.
XThis is a list of scanner states which require backtracking
Xand the input characters on which they do so.  By adding rules one
Xcan remove backtracking states.  If all backtracking states
Xare eliminated and
X.B -f
Xor
X.B -F
Xis used, the generated scanner will run faster (see the
X.B -p
Xflag).  Only users who wish to squeeze every last cycle out of their
Xscanners need worry about this option.  (See the section on PERFORMANCE
XCONSIDERATIONS below.)
X.TP
X.B -c
Xis a do-nothing, deprecated option included for POSIX compliance.
X.IP
X.B NOTE:
Xin previous releases of
X.I flex
X.B -c
Xspecified table-compression options.  This functionality is
Xnow given by the
X.B -C
Xflag.  To ease the the impact of this change, when
X.I flex
Xencounters
X.B -c,
Xit currently issues a warning message and assumes that
X.B -C
Xwas desired instead.  In the future this "promotion" of
X.B -c
Xto
X.B -C
Xwill go away in the name of full POSIX compliance (unless
Xthe POSIX meaning is removed first).
X.TP
X.B -d
Xmakes the generated scanner run in
X.I debug
Xmode.  Whenever a pattern is recognized and the global
X.B yy_flex_debug
Xis non-zero (which is the default),
Xthe scanner will write to
X.I stderr
Xa line of the form:
X.nf
X
X    --accepting rule at line 53 ("the matched text")
X
X.fi
XThe line number refers to the location of the rule in the file
Xdefining the scanner (i.e., the file that was fed to flex).  Messages
Xare also generated when the scanner backtracks, accepts the
Xdefault rule, reaches the end of its input buffer (or encounters
Xa NUL; at this point, the two look the same as far as the scanner's concerned),
Xor reaches an end-of-file.
X.TP
X.B -f
Xspecifies (take your pick)
X.I full table
Xor
X.I fast scanner.
XNo table compression is done.  The result is large but fast.
XThis option is equivalent to
X.B -Cf
X(see below).
X.TP
X.B -i
Xinstructs
X.I flex
Xto generate a
X.I case-insensitive
Xscanner.  The case of letters given in the
X.I flex
Xinput patterns will
Xbe ignored, and tokens in the input will be matched regardless of case.  The
Xmatched text given in
X.I yytext
Xwill have the preserved case (i.e., it will not be folded).
X.TP
X.B -n
Xis another do-nothing, deprecated option included only for
XPOSIX compliance.
X.TP
X.B -p
Xgenerates a performance report to stderr.  The report
Xconsists of comments regarding features of the
X.I flex
Xinput file which will cause a loss of performance in the resulting scanner.
XNote that the use of
X.I REJECT
Xand variable trailing context (see the BUGS section in flex(1))
Xentails a substantial performance penalty; use of
X.I yymore(),
Xthe
X.B ^
Xoperator,
Xand the
X.B -I
Xflag entail minor performance penalties.
X.TP
X.B -s
Xcauses the
X.I default rule
X(that unmatched scanner input is echoed to
X.I stdout)
Xto be suppressed.  If the scanner encounters input that does not
Xmatch any of its rules, it aborts with an error.  This option is
Xuseful for finding holes in a scanner's rule set.
X.TP
X.B -t
Xinstructs
X.I flex
Xto write the scanner it generates to standard output instead
Xof
X.B lex.yy.c.
X.TP
X.B -v
Xspecifies that
X.I flex
Xshould write to
X.I stderr
Xa summary of statistics regarding the scanner it generates.
XMost of the statistics are meaningless to the casual
X.I flex
Xuser, but the
Xfirst line identifies the version of
X.I flex,
Xwhich is useful for figuring
Xout where you stand with respect to patches and new releases,
Xand the next two lines give the date when the scanner was created
Xand a summary of the flags which were in effect.
X.TP
X.B -F
Xspecifies that the
X.ul
Xfast
Xscanner table representation should be used.  This representation is
Xabout as fast as the full table representation
X.ul
X(-f),
Xand for some sets of patterns will be considerably smaller (and for
Xothers, larger).  In general, if the pattern set contains both "keywords"
Xand a catch-all, "identifier" rule, such as in the set:
X.nf
X
X    "case"    return TOK_CASE;
X    "switch"  return TOK_SWITCH;
X    ...
X    "default" return TOK_DEFAULT;
X    [a-z]+    return TOK_ID;
X
X.fi
Xthen you're better off using the full table representation.  If only
Xthe "identifier" rule is present and you then use a hash table or some such
Xto detect the keywords, you're better off using
X.ul
X-F.
X.IP
XThis option is equivalent to
X.B -CF
X(see below).
X.TP
X.B -I
Xinstructs
X.I flex
Xto generate an
X.I interactive
Xscanner.  Normally, scanners generated by
X.I flex
Xalways look ahead one
Xcharacter before deciding that a rule has been matched.  At the cost of
Xsome scanning overhead,
X.I flex
Xwill generate a scanner which only looks ahead
Xwhen needed.  Such scanners are called
X.I interactive
Xbecause if you want to write a scanner for an interactive system such as a
Xcommand shell, you will probably want the user's input to be terminated
Xwith a newline, and without
X.B -I
Xthe user will have to type a character in addition to the newline in order
Xto have the newline recognized.  This leads to dreadful interactive
Xperformance.
X.IP
XIf all this seems to confusing, here's the general rule: if a human will
Xbe typing in input to your scanner, use
X.B -I,
Xotherwise don't; if you don't care about squeezing the utmost performance
Xfrom your scanner and you
Xdon't want to make any assumptions about the input to your scanner,
Xuse
X.B -I.
X.IP
XNote,
X.B -I
Xcannot be used in conjunction with
X.I full
Xor
X.I fast tables,
Xi.e., the
X.B -f, -F, -Cf,
Xor
X.B -CF
Xflags.
X.TP
X.B -L
Xinstructs
X.I flex
Xnot to generate
X.B #line
Xdirectives.  Without this option,
X.I flex
Xpeppers the generated scanner
Xwith #line directives so error messages in the actions will be correctly
Xlocated with respect to the original
X.I flex
Xinput file, and not to
Xthe fairly meaningless line numbers of
X.B lex.yy.c.
X(Unfortunately
X.I flex
Xdoes not presently generate the necessary directives
Xto "retarget" the line numbers for those parts of
X.B lex.yy.c
Xwhich it generated.  So if there is an error in the generated code,
Xa meaningless line number is reported.)
X.TP
X.B -T
Xmakes
X.I flex
Xrun in
X.I trace
Xmode.  It will generate a lot of messages to
X.I stdout
Xconcerning
Xthe form of the input and the resultant non-deterministic and deterministic
Xfinite automata.  This option is mostly for use in maintaining
X.I flex.
X.TP
X.B -8
Xinstructs
X.I flex
Xto generate an 8-bit scanner, i.e., one which can recognize 8-bit
Xcharacters.  On some sites,
X.I flex
Xis installed with this option as the default.  On others, the default
Xis 7-bit characters.  To see which is the case, check the verbose
X.B (-v)
Xoutput for "equivalence classes created".  If the denominator of
Xthe number shown is 128, then by default
X.I flex
Xis generating 7-bit characters.  If it is 256, then the default is
X8-bit characters and the
X.B -8
Xflag is not required (but may be a good idea to keep the scanner
Xspecification portable).  Feeding a 7-bit scanner 8-bit characters
Xwill result in infinite loops, bus errors, or other such fireworks,
Xso when in doubt, use the flag.  Note that if equivalence classes
Xare used, 8-bit scanners take only slightly more table space than
X7-bit scanners (128 bytes, to be exact); if equivalence classes are
Xnot used, however, then the tables may grow up to twice their
X7-bit size.
X.TP 
X.B -C[efmF]
Xcontrols the degree of table compression.
X.IP
X.B -Ce
Xdirects
X.I flex
Xto construct
X.I equivalence classes,
Xi.e., sets of characters
Xwhich have identical lexical properties (for example, if the only
Xappearance of digits in the
X.I flex
Xinput is in the character class
X"[0-9]" then the digits '0', '1', ..., '9' will all be put
Xin the same equivalence class).  Equivalence classes usually give
Xdramatic reductions in the final table/object file sizes (typically
Xa factor of 2-5) and are pretty cheap performance-wise (one array
Xlook-up per character scanned).
X.IP
X.B -Cf
Xspecifies that the
X.I full
Xscanner tables should be generated -
X.I flex
Xshould not compress the
Xtables by taking advantages of similar transition functions for
Xdifferent states.
X.IP
X.B -CF
Xspecifies that the alternate fast scanner representation (described
Xabove under the
X.B -F
Xflag)
Xshould be used.
X.IP
X.B -Cm
Xdirects
X.I flex
Xto construct
X.I meta-equivalence classes,
Xwhich are sets of equivalence classes (or characters, if equivalence
Xclasses are not being used) that are commonly used together.  Meta-equivalence
Xclasses are often a big win when using compressed tables, but they
Xhave a moderate performance impact (one or two "if" tests and one
Xarray look-up per character scanned).
X.IP
XA lone
X.B -C
Xspecifies that the scanner tables should be compressed but neither
Xequivalence classes nor meta-equivalence classes should be used.
X.IP
XThe options
X.B -Cf
Xor
X.B -CF
Xand
X.B -Cm
Xdo not make sense together - there is no opportunity for meta-equivalence
Xclasses if the table is not being compressed.  Otherwise the options
Xmay be freely mixed.
X.IP
XThe default setting is
X.B -Cem,
Xwhich specifies that
X.I flex
Xshould generate equivalence classes
Xand meta-equivalence classes.  This setting provides the highest
Xdegree of table compression.  You can trade off
Xfaster-executing scanners at the cost of larger tables with
Xthe following generally being true:
X.nf
X
X    slowest & smallest
X          -Cem
X          -Cm
X          -Ce
X          -C
X          -C{f,F}e
X          -C{f,F}
X    fastest & largest
X
X.fi
XNote that scanners with the smallest tables are usually generated and
Xcompiled the quickest, so
Xduring development you will usually want to use the default, maximal
Xcompression.
X.IP
X.B -Cfe
Xis often a good compromise between speed and size for production
Xscanners.
X.IP
X.B -C
Xoptions are not cumulative; whenever the flag is encountered, the
Xprevious -C settings are forgotten.
X.TP
X.B -Sskeleton_file
Xoverrides the default skeleton file from which
X.I flex
Xconstructs its scanners.  You'll never need this option unless you are doing
X.I flex
Xmaintenance or development.
X.SH PERFORMANCE CONSIDERATIONS
XThe main design goal of
X.I flex
Xis that it generate high-performance scanners.  It has been optimized
Xfor dealing well with large sets of rules.  Aside from the effects
Xof table compression on scanner speed outlined above,
Xthere are a number of options/actions which degrade performance.  These
Xare, from most expensive to least:
X.nf
X
X    REJECT
X
X    pattern sets that require backtracking
X    arbitrary trailing context
X
X    '^' beginning-of-line operator
X    yymore()
X
X.fi
Xwith the first three all being quite expensive and the last two
Xbeing quite cheap.
X.LP
X.B REJECT
Xshould be avoided at all costs when performance is important.
XIt is a particularly expensive option.
X.LP
XGetting rid of backtracking is messy and often may be an enormous
Xamount of work for a complicated scanner.  In principal, one begins
Xby using the
X.B -b 
Xflag to generate a
X.I lex.backtrack
Xfile.  For example, on the input
X.nf
X
X    %%
X    foo        return TOK_KEYWORD;
X    foobar     return TOK_KEYWORD;
X
X.fi
Xthe file looks like:
X.nf
X
X    State #6 is non-accepting -
X     associated rule line numbers:
X           2       3
X     out-transitions: [ o ]
X     jam-transitions: EOF [ \\001-n  p-\\177 ]
X
X    State #8 is non-accepting -
X     associated rule line numbers:
X           3
X     out-transitions: [ a ]
X     jam-transitions: EOF [ \\001-`  b-\\177 ]
X
X    State #9 is non-accepting -
X     associated rule line numbers:
X           3
X     out-transitions: [ r ]
X     jam-transitions: EOF [ \\001-q  s-\\177 ]
X
X    Compressed tables always backtrack.
X
X.fi
XThe first few lines tell us that there's a scanner state in
Xwhich it can make a transition on an 'o' but not on any other
Xcharacter, and that in that state the currently scanned text does not match
Xany rule.  The state occurs when trying to match the rules found
Xat lines 2 and 3 in the input file.
XIf the scanner is in that state and then reads
Xsomething other than an 'o', it will have to backtrack to find
Xa rule which is matched.  With
Xa bit of headscratching one can see that this must be the
Xstate it's in when it has seen "fo".  When this has happened,
Xif anything other than another 'o' is seen, the scanner will
Xhave to back up to simply match the 'f' (by the default rule).
X.LP
XThe comment regarding State #8 indicates there's a problem
Xwhen "foob" has been scanned.  Indeed, on any character other
Xthan a 'b', the scanner will have to back up to accept "foo".
XSimilarly, the comment for State #9 concerns when "fooba" has
Xbeen scanned.
X.LP
XThe final comment reminds us that there's no point going to
Xall the trouble of removing backtracking from the rules unless
Xwe're using
X.B -f
Xor
X.B -F,
Xsince there's no performance gain doing so with compressed scanners.
X.LP
XThe way to remove the backtracking is to add "error" rules:
X.nf
X
X    %%
X    foo         return TOK_KEYWORD;
X    foobar      return TOK_KEYWORD;
X
X    fooba       |
X    foob        |
X    fo          {
X                /* false alarm, not really a keyword */
X                return TOK_ID;
X                }
X
X.fi
X.LP
XEliminating backtracking among a list of keywords can also be
Xdone using a "catch-all" rule:
X.nf
X
X    %%
X    foo         return TOK_KEYWORD;
X    foobar      return TOK_KEYWORD;
X
X    [a-z]+      return TOK_ID;
X
X.fi
XThis is usually the best solution when appropriate.
X.LP
XBacktracking messages tend to cascade.
XWith a complicated set of rules it's not uncommon to get hundreds
Xof messages.  If one can decipher them, though, it often
Xonly takes a dozen or so rules to eliminate the backtracking (though
Xit's easy to make a mistake and have an error rule accidentally match
Xa valid token.  A possible future
X.I flex
Xfeature will be to automatically add rules to eliminate backtracking).
X.LP
X.I Variable
Xtrailing context (where both the leading and trailing parts do not have
Xa fixed length) entails almost the same performance loss as
X.I REJECT
X(i.e., substantial).  So when possible a rule like:
X.nf
X
X    %%
X    mouse|rat/(cat|dog)   run();
X
X.fi
Xis better written:
X.nf
X
X    %%
X    mouse/cat|dog         run();
X    rat/cat|dog           run();
X
X.fi
Xor as
X.nf
X
X    %%
X    mouse|rat/cat         run();
X    mouse|rat/dog         run();
X
X.fi
XNote that here the special '|' action does
X.I not
Xprovide any savings, and can even make things worse (see
X.B BUGS
Xin flex(1)).
X.LP
XAnother area where the user can increase a scanner's performance
X(and one that's easier to implement) arises from the fact that
Xthe longer the tokens matched, the faster the scanner will run.
XThis is because with long tokens the processing of most input
Xcharacters takes place in the (short) inner scanning loop, and
Xdoes not often have to go through the additional work of setting up
Xthe scanning environment (e.g.,
X.B yytext)
Xfor the action.  Recall the scanner for C comments:
X.nf
X
X    %x comment
X    %%
X            int line_num = 1;
X
X    "/*"         BEGIN(comment);
X
X    <comment>[^*\\n]*
X    <comment>"*"+[^*/\\n]*
X    <comment>\\n             ++line_num;
X    <comment>"*"+"/"        BEGIN(INITIAL);
X
X.fi
XThis could be sped up by writing it as:
X.nf
X
X    %x comment
X    %%
X            int line_num = 1;
X
X    "/*"         BEGIN(comment);
X
X    <comment>[^*\\n]*
X    <comment>[^*\\n]*\\n      ++line_num;
X    <comment>"*"+[^*/\\n]*
X    <comment>"*"+[^*/\\n]*\\n ++line_num;
X    <comment>"*"+"/"        BEGIN(INITIAL);
X
X.fi
XNow instead of each newline requiring the processing of another
Xaction, recognizing the newlines is "distributed" over the other rules
Xto keep the matched text as long as possible.  Note that
X.I adding
Xrules does
X.I not
Xslow down the scanner!  The speed of the scanner is independent
Xof the number of rules or (modulo the considerations given at the
Xbeginning of this section) how complicated the rules are with
Xregard to operators such as '*' and '|'.
X.LP
XA final example in speeding up a scanner: suppose you want to scan
Xthrough a file containing identifiers and keywords, one per line
Xand with no other extraneous characters, and recognize all the
Xkeywords.  A natural first approach is:
X.nf
X
X    %%
X    asm      |
X    auto     |
X    break    |
X    ... etc ...
X    volatile |
X    while    /* it's a keyword */
X
X    .|\\n     /* it's not a keyword */
X
X.fi
XTo eliminate the back-tracking, introduce a catch-all rule:
X.nf
X
X    %%
X    asm      |
X    auto     |
X    break    |
X    ... etc ...
X    volatile |
X    while    /* it's a keyword */
X
X    [a-z]+   |
X    .|\\n     /* it's not a keyword */
X
X.fi
XNow, if it's guaranteed that there's exactly one word per line,
Xthen we can reduce the total number of matches by a half by
Xmerging in the recognition of newlines with that of the other
Xtokens:
X.nf
X
X    %%
X    asm\\n    |
X    auto\\n   |
X    break\\n  |
X    ... etc ...
X    volatile\\n |
X    while\\n  /* it's a keyword */
X
X    [a-z]+\\n |
X    .|\\n     /* it's not a keyword */
X
X.fi
XOne has to be careful here, as we have now reintroduced backtracking
Xinto the scanner.  In particular, while
X.I we
Xknow that there will never be any characters in the input stream
Xother than letters or newlines,
X.I flex
Xcan't figure this out, and it will plan for possibly needing backtracking
Xwhen it has scanned a token like "auto" and then the next character
Xis something other than a newline or a letter.  Previously it would
Xthen just match the "auto" rule and be done, but now it has no "auto"
Xrule, only a "auto\\n" rule.  To eliminate the possibility of backtracking,
Xwe could either duplicate all rules but without final newlines, or,
Xsince we never expect to encounter such an input and therefore don't
Xhow it's classified, we can introduce one more catch-all rule, this
Xone which doesn't include a newline:
X.nf
X
X    %%
X    asm\\n    |
X    auto\\n   |
X    break\\n  |
X    ... etc ...
X    volatile\\n |
X    while\\n  /* it's a keyword */
X
X    [a-z]+\\n |
X    [a-z]+   |
X    .|\\n     /* it's not a keyword */
X
X.fi
XCompiled with
X.B -Cf,
Xthis is about as fast as one can get a
X.I flex 
Xscanner to go for this particular problem.
X.LP
XA final note:
X.I flex
Xis slow when matching NUL's, particularly when a token contains
Xmultiple NUL's.
XIt's best to write rules which match
X.I short
Xamounts of text if it's anticipated that the text will often include NUL's.
X.SH INCOMPATIBILITIES WITH LEX AND POSIX
X.I flex
Xis a rewrite of the Unix
X.I lex
Xtool (the two implementations do not share any code, though),
Xwith some extensions and incompatibilities, both of which
Xare of concern to those who wish to write scanners acceptable
Xto either implementation.  At present, the POSIX
X.I lex
Xdraft is
Xvery close to the original
X.I lex
Ximplementation, so some of these
Xincompatibilities are also in conflict with the POSIX draft.  But
Xthe intent is that except as noted below,
X.I flex
Xas it presently stands will
Xultimately be POSIX conformant (i.e., that those areas of conflict with
Xthe POSIX draft will be resolved in
X.I flex's
Xfavor).  Please bear in
Xmind that all the comments which follow are with regard to the POSIX
X.I draft
Xstandard of Summer 1989, and not the final document (or subsequent
Xdrafts); they are included so
X.I flex
Xusers can be aware of the standardization issues and those areas where
X.I flex
Xmay in the near future undergo changes incompatible with
Xits current definition.
X.LP
X.I flex
Xis fully compatible with
X.I lex
Xwith the following exceptions:
X.IP -
XThe undocumented
X.I lex
Xscanner internal variable
X.B yylineno
Xis not supported.  It is difficult to support this option efficiently,
Xsince it requires examining every character scanned and reexamining
Xthe characters when the scanner backs up.
XThings get more complicated when the end of buffer or file is reached or a
XNUL is scanned (since the scan must then be restarted with the proper line
Xnumber count), or the user uses the yyless(), unput(), or REJECT actions,
Xor the multiple input buffer functions.
X.IP
XThe fix is to add rules which, upon seeing a newline, increment
Xyylineno.  This is usually an easy process, though it can be a drag if some
Xof the patterns can match multiple newlines along with other characters.
X.IP
Xyylineno is not part of the POSIX draft.
X.IP -
XThe
X.B input()
Xroutine is not redefinable, though it may be called to read characters
Xfollowing whatever has been matched by a rule.  If
X.B input()
Xencounters an end-of-file the normal
X.B yywrap()
Xprocessing is done.  A ``real'' end-of-file is returned by
X.B input()
Xas
X.I EOF.
X.IP
XInput is instead controlled by redefining the
X.B YY_INPUT
Xmacro.
X.IP
XThe
X.I flex
Xrestriction that
X.B input()
Xcannot be redefined is in accordance with the POSIX draft, but
X.B YY_INPUT
Xhas not yet been accepted into the draft (and probably won't; it looks
Xlike the draft will simply not specify any way of controlling the
Xscanner's input other than by making an initial assignment to
X.I yyin).
X.IP -
X.I flex
Xscanners do not use stdio for input.  Because of this, when writing an
Xinteractive scanner one must explicitly call fflush() on the
Xstream associated with the terminal after writing out a prompt.
XWith
X.I lex
Xsuch writes are automatically flushed since
X.I lex
Xscanners use
X.B getchar()
Xfor their input.  Also, when writing interactive scanners with
X.I flex,
Xthe
X.B -I
Xflag must be used.
X.IP -
X.I flex
Xscanners are not as reentrant as
X.I lex
Xscanners.  In particular, if you have an interactive scanner and
Xan interrupt handler which long-jumps out of the scanner, and
Xthe scanner is subsequently called again, you may get the following
Xmessage:
X.nf
X
X    fatal flex scanner internal error--end of buffer missed
X
X.fi
XTo reenter the scanner, first use
X.nf
X
X    yyrestart( yyin );
X
X.fi
X.IP -
X.B output()
Xis not supported.
XOutput from the
X.B ECHO
Xmacro is done to the file-pointer
X.I yyout
X(default
X.I stdout).
X.IP
XThe POSIX draft mentions that an
X.B output()
Xroutine exists but currently gives no details as to what it does.
X.IP -
X.I lex
Xdoes not support exclusive start conditions (%x), though they
Xare in the current POSIX draft.
X.IP -
XWhen definitions are expanded,
X.I flex
Xencloses them in parentheses.
XWith lex, the following:
X.nf
X
X    NAME    [A-Z][A-Z0-9]*
X    %%
X    foo{NAME}?      printf( "Found it\\n" );
X    %%
X
X.fi
Xwill not match the string "foo" because when the macro
Xis expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
Xand the precedence is such that the '?' is associated with
X"[A-Z0-9]*".  With
X.I flex,
Xthe rule will be expanded to
X"foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.
XNote that because of this, the
X.B ^, $, <s>, /,
Xand
X.B <<EOF>>
Xoperators cannot be used in a
X.I flex
Xdefinition.
X.IP
XThe POSIX draft interpretation is the same as
X.I flex's.
X.IP -
XTo specify a character class which matches anything but a left bracket (']'),
Xin
X.I lex
Xone can use "[^]]" but with
X.I flex
Xone must use "[^\\]]".  The latter works with
X.I lex,
Xtoo.
X.IP -
XThe
X.I lex
X.B %r
X(generate a Ratfor scanner) option is not supported.  It is not part
Xof the POSIX draft.
X.IP -
XIf you are providing your own yywrap() routine, you must include a
X"#undef yywrap" in the definitions section (section 1).  Note that
Xthe "#undef" will have to be enclosed in %{}'s.
X.IP
XThe POSIX draft
Xspecifies that yywrap() is a function and this is very unlikely to change; so
X.I flex users are warned
Xthat
X.B yywrap()
Xis likely to be changed to a function in the near future.
X.IP -
XAfter a call to
X.B unput(),
X.I yytext
Xand
X.I yyleng
Xare undefined until the next token is matched.  This is not the case with
X.I lex
Xor the present POSIX draft.
X.IP -
XThe precedence of the
X.B {}
X(numeric range) operator is different.
X.I lex
Xinterprets "abc{1,3}" as "match one, two, or
Xthree occurrences of 'abc'", whereas
X.I flex
Xinterprets it as "match 'ab'
Xfollowed by one, two, or three occurrences of 'c'".  The latter is
Xin agreement with the current POSIX draft.
X.IP -
XThe precedence of the
X.B ^
Xoperator is different.
X.I lex
Xinterprets "^foo|bar" as "match either 'foo' at the beginning of a line,
Xor 'bar' anywhere", whereas
X.I flex
Xinterprets it as "match either 'foo' or 'bar' if they come at the beginning
Xof a line".  The latter is in agreement with the current POSIX draft.
X.IP -
XTo refer to yytext outside of the scanner source file,
Xthe correct definition with
X.I flex
Xis "extern char *yytext" rather than "extern char yytext[]".
XThis is contrary to the current POSIX draft but a point on which
X.I flex
Xwill not be changing, as the array representation entails a
Xserious performance penalty.  It is hoped that the POSIX draft will
Xbe emended to support the
X.I flex
Xvariety of declaration (as this is a fairly painless change to
Xrequire of
X.I lex
Xusers).
X.IP -
X.I yyin
Xis
X.I initialized
Xby
X.I lex
Xto be
X.I stdin;
X.I flex,
Xon the other hand,
Xinitializes
X.I yyin
Xto NULL
Xand then
X.I assigns
Xit to
X.I stdin
Xthe first time the scanner is called, providing
X.I yyin
Xhas not already been assigned to a non-NULL value.  The difference is
Xsubtle, but the net effect is that with
X.I flex
Xscanners,
X.I yyin
Xdoes not have a valid value until the scanner has been called.
X.IP -
XThe special table-size declarations such as
X.B %a
Xsupported by
X.I lex
Xare not required by
X.I flex
Xscanners;
X.I flex
Xignores them.
X.IP -
XThe name
X.bd
XFLEX_SCANNER
Xis #define'd so scanners may be written for use with either
X.I flex
Xor
X.I lex.
X.LP
XThe following
X.I flex
Xfeatures are not included in
X.I lex
Xor the POSIX draft standard:
X.nf
X
X    yyterminate()
X    <<EOF>>
X    YY_DECL
X    #line directives
X    %{}'s around actions
X    yyrestart()
X    comments beginning with '#' (deprecated)
X    multiple actions on a line
X
X.fi
XThis last feature refers to the fact that with
X.I flex
Xyou can put multiple actions on the same line, separated with
Xsemi-colons, while with
X.I lex,
Xthe following
X.nf
X
X    foo    handle_foo(); ++num_foos_seen;
X
X.fi
Xis (rather surprisingly) truncated to
X.nf
X
X    foo    handle_foo();
X
X.fi
X.I flex
Xdoes not truncate the action.  Actions that are not enclosed in
Xbraces are simply terminated at the end of the line.
X.SH DIAGNOSTICS
X.I reject_used_but_not_detected undefined
Xor
X.I yymore_used_but_not_detected undefined -
XThese errors can occur at compile time.  They indicate that the
Xscanner uses
X.B REJECT
Xor
X.B yymore()
Xbut that
X.I flex
Xfailed to notice the fact, meaning that
X.I flex
Xscanned the first two sections looking for occurrences of these actions
Xand failed to find any, but somehow you snuck some in (via a #include
Xfile, for example).  Make an explicit reference to the action in your
X.I flex
Xinput file.  (Note that previously
X.I flex
Xsupported a
X.B %used/%unused
Xmechanism for dealing with this problem; this feature is still supported
Xbut now deprecated, and will go away soon unless the author hears from
Xpeople who can argue compellingly that they need it.)
X.LP
X.I flex scanner jammed -
Xa scanner compiled with
X.B -s
Xhas encountered an input string which wasn't matched by
Xany of its rules.
X.LP
X.I flex input buffer overflowed -
Xa scanner rule matched a string long enough to overflow the
Xscanner's internal input buffer (16K bytes by default - controlled by
X.B YY_BUF_SIZE
Xin "flex.skel".  Note that to redefine this macro, you must first
X.B #undefine
Xit).
X.LP
X.I scanner requires -8 flag -
XYour scanner specification includes recognizing 8-bit characters and
Xyou did not specify the -8 flag (and your site has not installed flex
Xwith -8 as the default).
X.LP
X.I
Xfatal flex scanner internal error--end of buffer missed -
XThis can occur in an scanner which is reentered after a long-jump
Xhas jumped out (or over) the scanner's activation frame.  Before
Xreentering the scanner, use:
X.nf
X
X    yyrestart( yyin );
X
X.fi
X.LP
X.I too many %t classes! -
XYou managed to put every single character into its own %t class.
X.I flex
Xrequires that at least one of the classes share characters.
X.SH DEFICIENCIES / BUGS
XSee flex(1).
X.SH "SEE ALSO"
X.LP
Xflex(1), lex(1), yacc(1), sed(1), awk(1).
X.LP
XM. E. Lesk and E. Schmidt,
X.I LEX - Lexical Analyzer Generator
X.SH AUTHOR
XVern Paxson, with the help of many ideas and much inspiration from
XVan Jacobson.  Original version by Jef Poskanzer.  The fast table
Xrepresentation is a partial implementation of a design done by Van
XJacobson.  The implementation was done by Kevin Gong and Vern Paxson.
X.LP
XThanks to the many
X.I flex
Xbeta-testers, feedbackers, and contributors, especially Casey
XLeedom, benson@odi.com, Keith Bostic,
XFrederic Brehm, Nick Christopher, Jason Coughlin,
XScott David Daniels, Leo Eskin,
XChris Faylor, Eric Goldman, Eric
XHughes, Jeffrey R. Jones, Kevin B. Kenny, Ronald Lamprecht,
XGreg Lee, Craig Leres, Mohamed el Lozy, Jim Meyering, Marc Nozell, Esmond Pitt,
XJef Poskanzer, Jim Roskind,
XDave Tallman, Frank Whaley, Ken Yap, and those whose names
Xhave slipped my marginal mail-archiving skills but whose contributions
Xare appreciated all the same.
X.LP
XThanks to Keith Bostic, John Gilmore, Craig Leres, Bob
XMulcahy, Rich Salz, and Richard Stallman for help with various distribution
Xheadaches.
X.LP
XThanks to Esmond Pitt and Earle Horton for 8-bit character support;
Xto Benson Margulies and Fred
XBurke for C++ support; to Ove Ewerlid for the basics of support for
XNUL's; and to Eric Hughes for the basics of support for multiple buffers.
X.LP
XWork is being done on extending
X.I flex
Xto generate scanners in which the
Xstate machine is directly represented in C code rather than tables.
XThese scanners may well be substantially faster than those generated
Xusing -f or -F.  If you are working in this area and are interested
Xin comparing notes and seeing whether redundant work can be avoided,
Xcontact Ove Ewerlid (ewerlid@mizar.DoCS.UU.SE).
X.LP
XThis work was primarily done when I was at the Real Time Systems Group
Xat the Lawrence Berkeley Laboratory in Berkeley, CA.  Many thanks to all there
Xfor the support I received.
X.LP
XSend comments to:
X.nf
X
X     Vern Paxson
X     Computer Science Department
X     4126 Upson Hall
X     Cornell University
X     Ithaca, NY 14853-7501
X
X     vern@cs.cornell.edu
X     decvax!cornell!vern
X
X.fi
END_OF_FILE
if test 65353 -ne `wc -c <'flexdoc.1'`; then
    echo shar: \"'flexdoc.1'\" unpacked with wrong size!
fi
# end of 'flexdoc.1'
fi
echo shar: End of archive 11 \(of 13\).
cp /dev/null ark11isdone
MISSING=""
for I in 1 2 3 4 5 6 7 8 9 10 11 12 13 ; do
    if test ! -f ark${I}isdone ; then
	MISSING="${MISSING} ${I}"
    fi
done
if test "${MISSING}" = "" ; then
    echo You have unpacked all 13 archives.
    rm -f ark[1-9]isdone ark[1-9][0-9]isdone
else
    echo You still need to unpack the following archives:
    echo "        " ${MISSING}
fi
##  End of shell archive.
exit 0
-- 
Mail submissions (sources or binaries) to <amiga@uunet.uu.net>.
Mail comments to the moderator at <amiga-request@uunet.uu.net>.
Post requests for sources, and general discussion to comp.sys.amiga.