sandel@tuvalu.sw.mcc.com (Charles Sandel) (06/23/89)
One of our programmers found a bug in the 'lex' source. The bug report follows. We get various error messages from Andrew's class. I finally tracked down what was the problem: a bug in lex. class uses lex for its lexical input, in particular to recognize the class keywords like InitializeObject and FinalizeObject. InitializeObject was not being recognized. Tracing through the state machine generated by lex and then used to parse the input, it turns out that there are *exactly* 255 states in the state machine. During construction, the states are listed as 0 to 255. In cmd/lex/sub2.c, line 797, lex checks to see if it needs a byte (char) or larger quantity to store the states: fprintf(fout,"# define YYTYPE %s\n",stnum+1 > NCH ? "int" : "char"); where NCH is 256 (number of characters). Notice that stnum+1 is compared rather than stnum (number of states). This is because the zero state is used as an error state, and the states 0..255 are shifted up by 1 to 1..256. stnum is 255 so stnum+1 is 256. stnum+1 is not greater than NCH (which is 256) since they are equal, and a char is then used for YYTYPE which holds the state number. As a result, lex creates tables which store state 256 (old state 255+1) in a char. This is, of course, zero, and the lexical token ending in that state is not recognized. I will submit a bug report on this today. In fact this code is badly written in several respects. First, the comparison should not be against NCH. The value of NCH varies according to whether NLS (with an 8-bit character code) is used or not. However the size of a number that can be stored in a char is dependent upon the size of the char, not whether a 7-bit or 8-bit character code is used. Further, we want the smallest storage unit that will hold all the state values. Thus the code should be: fprintf(fout,"# define YYTYPE %s\n", (stnum+1 <= 0xFF) ? "unsigned char" : (stnum+1 <= 0xFFFF) ? "unsigned short" : "unsigned long");