rekers@cwi.nl (Jan Rekers) (06/27/90)
In article <1990Jun21.033349.2983@esegue.segue.boston.ma.us>, andrea@eric.mpr.ca (Jennitta Andrea) writes: |>I have two regular expressions: |> |>{D}{D}":"{D}{D}":"{D}{D} { /* recognize "TIMESTAMP" token */ } |>{STRING} { /* recognize STRING token */ } |> |>Because my definition of a "STRING" is so general, the following input |>stream: |> |> 12:30:49AC |> |>is tokenized into a single STRING token ("12:30:49AC"), rather than into a |>TIMESTAMP token ("12:30:49") and a STRING token ("AC"). The most general solution to this problem would be to allow multiple lexical channels, which are fed to a parser which can split up (like the Tomita algorithm can for example). On input 12:30:49AC the lexer returns two token streams: (timestamp: 12:30:49) (string: 12:30:49AC) The parser splits up in a parser for each possibility which each obtain an own lexical channel. The next tokens in the channels decide which of the parsers wins. This is a quite general (and inefficient) solution, which can also be used to solve lexing and parsing for FORTRAN in a very neat manner. We consider to implement the above solution; if anybody knows more about it, please let us know... Jan Rekers (rekers@cwi.nl) Centre for Mathematics and Computer Science P.O. Box 4079, 1009 AB Amsterdam, The Netherlands -- Send compilers articles to compilers@esegue.segue.boston.ma.us {spdcc | ima | lotus| world}!esegue. Meta-mail to compilers-request@esegue.