[comp.compilers] Question on lex's disambiguating rules

rekers@cwi.nl (Jan Rekers) (06/27/90)
In article <1990Jun21.033349.2983@esegue.segue.boston.ma.us>,
andrea@eric.mpr.ca (Jennitta Andrea) writes:
|>I have two regular expressions:
|>
|>{D}{D}":"{D}{D}":"{D}{D} { /* recognize "TIMESTAMP" token */ }
|>{STRING}                 { /* recognize STRING token */ }
|>
|>Because my definition of a "STRING" is so general, the following input
|>stream:
|>
|>   12:30:49AC
|>
|>is tokenized into a single STRING token ("12:30:49AC"), rather than into a 
|>TIMESTAMP token ("12:30:49") and a STRING token ("AC").

The most general solution to this problem would be to allow multiple lexical
channels, which are fed to a parser which can split up (like the Tomita
algorithm can for example).

On input 12:30:49AC the lexer returns two token streams:
	(timestamp: 12:30:49)
	(string: 12:30:49AC)
The parser splits up in a parser for each possibility which each obtain
an own lexical channel. The next tokens in the channels decide which
of the parsers wins.

This is a quite general (and inefficient) solution, which can also be used to
solve lexing and parsing for FORTRAN in a very neat manner.
We consider to implement the above solution; if anybody knows more about
it, please let us know...

Jan Rekers (rekers@cwi.nl)    Centre for Mathematics and Computer Science
			      P.O. Box 4079, 1009 AB Amsterdam, The Netherlands
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus| world}!esegue.  Meta-mail to compilers-request@esegue.