[net.unix-wizards] anyone played with yacc?

rcj@clyde.UUCP (R. Curtis Jackson) (06/06/84)

Hi, netland -- this question involves a Vax 11/780 running USG 5.0,
but will probably apply to any 32-bit machine running almost any Unix.

I have a guy here at Whippany who wants to have a yacc input with a
large number of tokens in it, but yacc barfs after 127.  I looked through
the source code and found a define for NTERMS to be 127.  My question
is, This is a pretty suspicious-looking number and I am wondering
if anyone has experimented with it and found out how it interacts with
the other defines in that section of the file 'dextern' and also if
there is any garbage in the code that would screw up as a result of
increasing NTERMS to something like 255. (i.e., is there any funny
bit-masking, etc. that would blow up).

Any pointers (other than looking through source code -- that is a last
resort when you are already working 70-hour weeks....) would be
greatly appreciated.

Thanks,

andrew@orca.UUCP (Andrew Klossner) (06/07/84)

	"I have a guy here at Whippany who wants to have a yacc input
	with a large number of tokens in it, but yacc barfs after 127.
	I looked through the source code and found a define for NTERMS
	to be 127.  My question is, This is a pretty suspicious-looking
	number and I am wondering if anyone has experimented with it
	and found out how it interacts with the other defines in that
	section of the file 'dextern' and also if there is any garbage
	in the code that would screw up as a result of increasing
	NTERMS to something like 255. (i.e., is there any funny
	bit-masking, etc. that would blow up)."

We had to do this to implement ANSI Basic, which has *lots* of
productions, terminals, and nonterminals.  I changed the relevant
portion of "dextern" to the following:

# define ACTSIZE 24000 /* was 12000 */
# define MEMSIZE 24000 /* was 12000 */
# define NSTATES 3000 /* was 750 */
# define NTERMS 400 /* was 127 */
# define NPROD 1500 /* was 600 */
# define NNONTERM 600 /* was 300 */
# define TEMPSIZE 3000 /* 1200 */
# define CNAMSZ 10000 /* was 5000 */
# define LSETSIZE 800 /* was 600 */
# define WSETSIZE 800 /* was 350 */

I had no problem as long as I obeyed the following comment, which
occurs later in "dextern":

	/* relationships which must hold:
	TBITSET ints must hold NTERMS+1 bits...
	WSETSIZE >= NNONTERM
	LSETSIZE >= NNONTERM
	TEMPSIZE >= NTERMS + NNONTERMs + 1
	TEMPSIZE >= NSTATES
	*/

Caveat: don't bother trying this if you have 64k processes.  The
resulting a.out file has 400k of BSS space.

  -- Andrew Klossner   (decvax!tektronix!orca!andrew)      [UUCP]
                       (orca!andrew.tektronix@rand-relay)  [ARPA]

abe@ism780.UUCP (06/13/84)

#R:clyde:-45000:ism780:14400010:000:1031
ism780!abe    Jun 11 11:00:00 1984

In addition to having a very large yacc executable, this approach has
the drawback of not being able to compile your system on anything but
your own local hacked-yacc.

Another approach, if you are somewhere near the 127-terminal limit,
is to have various local rules do their own keyword-parsing.  E.g. if
you had something like:

type_specifier:
		INT
	     |  LONG
	     |  CHAR
	     |  SHORT
		.
		.
		.

and the INT, LONG, etc. are not used elsewhere as terminals, you can
remove the definitions of these as tokens, and do something like:

type_specifier:	identifier
		{
			if (equal(yytext, "int"))
				...
			else if (equal(yytext, "long"))
				...

			else (printf("illegal type specifier\n"));
		}

Of course, you have to be careful that this doesn't introduce ambiguities
into the grammar.  The above example is probably bad since it deals with
a pretty significant part of the language; but innocuous areas where the
above procedure can be used are probably present, especially if you've gone
over the 127-token limit!