[comp.unix] bug in SUN lex++; and whinge about fixed table sizes

otto@canon.co.uk (G. Paul Otto) (10/10/90)

I have just tried using SUN "yacc++" and "yacc" on a C++ grammar we
have (Roskind's).  They bomb out "too many rules" - and there is
apparently no way of fixing them without source - which we (like most
people) do not have.
Yesterday, I tried using "lex++" on the lex file with the grammar.  Of
course, the default table sizes are too small - but there is an option
to increase them (unlike in yacc!).  So I try that.  It turns out that
I have a choice: set the table sizes too small - in which case it bombs
with a sensible error message - or set them large enough - in which
case it just dumps core.  Repeated trials are needed for this, of
course!

The good news: lex & bison can handle the files (after upping the table
sizes in lex, of course).   Of course, this does make it more difficult
to use C++ with the yacc & lex grammars.

I am especially pissed off, because the table sizes do not seem to have
been increased much (if at all) from the days when these progs ran on
PDP-11's - which mean that the max process size was 64k or 128k
(depending upon which model of PDP you had).  Process sizes 50 x
bigger than that are quite reasonable these days.  (And some people
have larger machines, and may wish to be able to take advantage of
them when convenient.)


This is not an isolated case: last I checked, the table sizes built
into refer were *smaller* than those which had been used on larger
PDPs.  (Of course, under SunOS 4.0, refer is crocked anyway ...)
Similarly, "vi", "sort" and "sed" have given me grief over line-length
limits; "awk" over record sizes; and other utilities too numerous to
mention (but including databases, X windows & so forth).


In these modern times, it is often just as easy to use table structures
which will grow as required (the wonders of modularity!) - why, oh why,
do they seem to be used so seldom?


[ Aside: GNU deserve an honourable mention here, for two reasons:
(i) in my experience, their software is less likely to hit arbitrary
limits; (ii) since they distribute source, it is easy to adjust the
table sizes if necessary. ]


Paul
-----------
Paul Otto, Canon Research Centre Europe Ltd.,
17-20 Frederick Sanger Rd, Surrey Research Park, Guildford, Surrey, GU2 5YD, UK.
NRS: otto@uk.co.canon	Internet: otto@canon.co.uk
UUCP: otto@canon.uucp	PATH: ..!ukc!uos-ee!canon!otto
Tel: +44 483 574325		Fax: +44 483 574360

bliss@sp64.csrd.uiuc.edu (Brian Bliss) (10/11/90)

Well, I do have access to the source for yacc...

when trying to parser pre-tokenized fortran I ran into problems
with having too many tokens for yacc on the alliant fx/8 to
handle (> 128).  so I go into the source and changed the NTERM
constant appropriately.  now everything compiles correctly,
but it bombs at run-time.  it turns out that I was manually
assigning values to the tokens that were larger than 256,
and yaccpar was bombing.  I couldn't change the token values
(in order to be compatible with someone else's code), so
I up NTERM to 2056.  it still bombs at run-time.  it turns
out that there is a constant, YYFLAG (-1000 by default), such
that NTERM must be less than -YYFLAG.  of course, this was
not documented in the source.  so I change YYFLAG to -2000
(I wasn't using tokens in the range 2000-2056 - in retrospect,
I should have made it -3000).  still bombs.  turns out that
YYFLAG is defined in two places, once for the yacc soruce
(so that it generates the appropriate tables), and again
in yaccpar (the c code that is included with the generated
tables to form the parser).  not only that, but I had to 
redefine the NOMORE constant in the yacc source.

3 days later I got it to work...