dfz (12/02/82)
This is an answer to a "lex" question asked on the net-wizards newsgroup. Please excuse the redundancy: this is my last try! To: Randy Bentson Address: harpo!decvax!cca!hplabs!hao!csu-cs!bentson@sri-unix Lex has the ability to allow the user to insert a user-tailored set of states in front of the lex-generated finite state machine. These states can be used to subdivide the lex-generated FSA into as many independent FSAs as the user desires. This trick comes in handy when performing lexical analysis upon an input which contains delimited sections in which the rules of analysis are very different from each other (e.g. comments or strings vs. C language text). The reason you don't see this treatment used for comments in most C compilers is that comments are stripped out by the C preprocessor, rather than by the lexical analyzer of the C parser. The technique for doing this is to use the "%start" directive (see the lex manual). I will show an example: --------------------------------------------------- %start normal comment %% <normal> ... { code for expression } <normal> ... { code for expression } ... ... ... <normal>"/*" { BEGIN comment; } <comment>"*/" { BEGIN normal; } <comment>\n {} <comment>. {} --------------------------------------------------- In the above example, the lexical rules are divided into two independent sets of rules; the rules for <comment> are entirely independent of those for <normal>. Furthermore, one can jump from one state (e.g. <normal>) to another (e.g. <comment>) as part of the action taken upon recognizing an expression in a state. Thus one has effectively created two independent FSAs. Lex uses some sort of push-down stack, so that when you enter another state (using the BEGIN macro) and then return, you current state is preserved. All pattern recognition done in the other state effectively removes the text it recognized from the view of the current state. This jumping (I should say pushing and popping) can be nested and even done recursively, although this is not usually needed for most applications. Below is a second example, which uses three separate states to recognize normal text, comments, and strings: --------------------------------------------------- %start normal comment string %% <normal> ... { code for expression } <normal> ... { code for expression } ... ... ... <normal>"/*" { BEGIN comment; } <comment>"*/" { BEGIN normal; } <comment>\n {} <comment>. {} <normal>\" { BEGIN string; } <string>\" { BEGIN normal; } <string> ... { code for expression } ... ... ... --------------------------------------------------- I hope I have answered your question. Dave Ziffer Bell Labs in Naperville, Illinios ...!decvax!harpo!iwlc7!dfz
tihor (12/03/82)
#R:iwlc7:-10800:cmcl2:11400003:000:284 cmcl2!tihor Dec 2 14:45:00 1982 It is not clear from your message how one tells lex that one wishes to push the current state and begin a new invokation of an automaton already on the stack. The obvious syntax BEGIN newautomaton; seems to be used to pop back to a previous version of newautomaton already stacked.