[comp.compilers] Lex and Yacc - Availability?

johnl@ima.UUCP (08/12/87)

I'm building a compiler for Algol 68, which presents some interesting
tokenizing and parsing problems.  Right now, I'm using a p.d. Lex, but
I've heard bad things said about Lex in general, usually that it's
slow.  Does anyone out there know of a (semi-)p.d. Lex-type program
that is better?  Or, more generally, is there a truly better way to
tokenize?

As far as Yacc goes, it seems to me that the power of LALR vs. LL
parsing, and the fact that it is table-driven are big wins, over and
above the development advantages.  (Table-driven gives you:  smaller
parsers for large languages, accessibility of the entire parse state
for error diagnostics, ability to build other tools that use the same
tables (e.g., for debugging the grammar))  People like to claim that
Yacc is slow, but has anyone really investigated this?
-- 
Dale Worley	Cullinet Software		ARPA: cullvax!drw@eddie.mit.edu
UUCP: ...!seismo!harvard!mit-eddie!cullvax!drw
[Most people I know write lexers by hand, because it's so easy.  Lex does indeed
generate big slow lexers -- it's too powerful in the wrong way for lexing most
computer languages.  I've also heard that yacc is slow, but have never been
persuaded that it makes any difference.  What I'd really like to hear is how
you deal with Algol-68's two-level grammar without expanding it to a context
free grammar the size of a small planet.  I've heard of no work on parsing
such grammars directly.  -John]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

johnl@ima.UUCP (08/17/87)

    At the Winter 1987 Usenix Van Jacobson of LBL labs presented a
paper describing a much improved version of Lex.  He got the main
processing down to a single table lookup in memory!  (The rumor was
that it was just marginally slower than 'cat').  I don't know what the
current status of the project is; I would very much like either a copy
of his paper or the program itself.  Anyone know more than I?

  Randy Smith    @	NCI Supercomputer Facility
  c/o PRI, Inc.		
  PO Box B, Bldng. 430  Phone: (301) 698-5660                  
  Frederick, MD 21701  	Uucp: ...!uunet!mimsy!elsie!ncifcrf!randy
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

vern%lbl-helios@lbl-rtsg.arpa (Vern Paxson) (08/20/87)

>     At the Winter 1987 Usenix Van Jacobson of LBL labs presented a
> paper describing a much improved version of Lex...
> processing down to a single table lookup in memory!  (The rumor was
> that it was just marginally slower than 'cat').  I don't know what the
> current status of the project is; I would very much like either a copy
> of his paper or the program itself.  Anyone know more than I?

A student I'm supervising is adding Van's fast algorithm to my lex
re-write ("flex").  He's finished with the basics of the implementation,
but there's still a lot of tuning and clean-up before it'll be ready
for a beta-test and subsequent release.  (Details on distribution terms
are still being worked out, but it looks like it'll have a copyright that
says "freely redistribute, but don't make a significant enhancement
without contacting us first, and be willing to give UC rights to the
enhancement"; possibly it'll carry a more generous, GNU-like copyright.)

While there's still tuning to do, the preliminary results, done for a
C tokenizer, are (1) fast as cat?  No, not quite (I'll be going over
the implementation with Van to see where tuning might be needed); (2) fast
as a hand-coded scanner?  Well, as things stand now, it is about 15%
faster than PCC's tokenizer, which seems to have been done with some care.

	Vern Paxson				vern@lbl-csam.arpa
	Real Time Systems			ucbvax!lbl-csam.arpa!vern
	Lawrence Berkeley Laboratory		(415) 486-6411
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request