rcj@clyde.UUCP (R. Curtis Jackson) (06/06/84)
Hi, netland -- this question involves a Vax 11/780 running USG 5.0, but will probably apply to any 32-bit machine running almost any Unix. I have a guy here at Whippany who wants to have a yacc input with a large number of tokens in it, but yacc barfs after 127. I looked through the source code and found a define for NTERMS to be 127. My question is, This is a pretty suspicious-looking number and I am wondering if anyone has experimented with it and found out how it interacts with the other defines in that section of the file 'dextern' and also if there is any garbage in the code that would screw up as a result of increasing NTERMS to something like 255. (i.e., is there any funny bit-masking, etc. that would blow up). Any pointers (other than looking through source code -- that is a last resort when you are already working 70-hour weeks....) would be greatly appreciated. Thanks,
andrew@orca.UUCP (Andrew Klossner) (06/07/84)
"I have a guy here at Whippany who wants to have a yacc input with a large number of tokens in it, but yacc barfs after 127. I looked through the source code and found a define for NTERMS to be 127. My question is, This is a pretty suspicious-looking number and I am wondering if anyone has experimented with it and found out how it interacts with the other defines in that section of the file 'dextern' and also if there is any garbage in the code that would screw up as a result of increasing NTERMS to something like 255. (i.e., is there any funny bit-masking, etc. that would blow up)." We had to do this to implement ANSI Basic, which has *lots* of productions, terminals, and nonterminals. I changed the relevant portion of "dextern" to the following: # define ACTSIZE 24000 /* was 12000 */ # define MEMSIZE 24000 /* was 12000 */ # define NSTATES 3000 /* was 750 */ # define NTERMS 400 /* was 127 */ # define NPROD 1500 /* was 600 */ # define NNONTERM 600 /* was 300 */ # define TEMPSIZE 3000 /* 1200 */ # define CNAMSZ 10000 /* was 5000 */ # define LSETSIZE 800 /* was 600 */ # define WSETSIZE 800 /* was 350 */ I had no problem as long as I obeyed the following comment, which occurs later in "dextern": /* relationships which must hold: TBITSET ints must hold NTERMS+1 bits... WSETSIZE >= NNONTERM LSETSIZE >= NNONTERM TEMPSIZE >= NTERMS + NNONTERMs + 1 TEMPSIZE >= NSTATES */ Caveat: don't bother trying this if you have 64k processes. The resulting a.out file has 400k of BSS space. -- Andrew Klossner (decvax!tektronix!orca!andrew) [UUCP] (orca!andrew.tektronix@rand-relay) [ARPA]
abe@ism780.UUCP (06/13/84)
#R:clyde:-45000:ism780:14400010:000:1031 ism780!abe Jun 11 11:00:00 1984 In addition to having a very large yacc executable, this approach has the drawback of not being able to compile your system on anything but your own local hacked-yacc. Another approach, if you are somewhere near the 127-terminal limit, is to have various local rules do their own keyword-parsing. E.g. if you had something like: type_specifier: INT | LONG | CHAR | SHORT . . . and the INT, LONG, etc. are not used elsewhere as terminals, you can remove the definitions of these as tokens, and do something like: type_specifier: identifier { if (equal(yytext, "int")) ... else if (equal(yytext, "long")) ... else (printf("illegal type specifier\n")); } Of course, you have to be careful that this doesn't introduce ambiguities into the grammar. The above example is probably bad since it deals with a pretty significant part of the language; but innocuous areas where the above procedure can be used are probably present, especially if you've gone over the 127-token limit!