[comp.compilers] lex error "Compiler Design and Construction"

rdfloyd@ceaport.UUCP (Randy Floyd) (08/10/90)

In "Compiler Design and Construction" by: Authur B. Pyster  (second edition)

On page 67 there is a small lex for part of C syntax but when I try to run
it through FLEX I get an error.

Syntax error at line 38: bad iteration values

the definition of char and line 38 follow:

char \'([^'\n]|\\[ntrbrf'\n]|\\0[0-7]{0,2})+\'

line 38:

{char}		return token(CHARLIT);

now!!!

Can a kind soul please give me a rundown of the definition line
for char and tell me why I might be getting this message.

BTW: I am new to lex/yacc but learning.
thanks for any help!!

rdfloyd@ceaport
[This is a pretty gross way to define a character constant, but I don't
see anything obviously wrong with it, other than that it matches things
like '\000' ambiguously. -John]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus| world}!esegue.  Meta-mail to compilers-request@esegue.

vern@cs.cornell.edu (Vern Paxson) (08/14/90)

In article <136@ceaport.UUCP> rdfloyd@ceaport.UUCP (Randy Floyd) writes:
> In "Compiler Design and Construction" by: Authur B. Pyster  (second edition)
> 
> On page 67 there is a small lex for part of C syntax but when I try to run
> it through FLEX I get an error.
> 
> Syntax error at line 38: bad iteration values
> 
> the definition of char and line 38 follow:
> 
> char \'([^'\n]|\\[ntrbrf'\n]|\\0[0-7]{0,2})+\'
> ...
> Can a kind soul please give me a rundown of the definition line
> for char and tell me why I might be getting this message.

It seems likely that you are running a very old version of flex, as
this bug was fixed sometime prior to the flex 2.1 release of June 1989.
The 2.3 release should be turning up on comp.sources.unix real soon now.
You can get around the limitations of the old release by writing the
definition as:

	char \'([^'\n]|\\[ntrbrf'\n]|\\0([0-7]{1,2})?)+\'

Note also that if you're requiring octal escape sequences to start with
a leading 0, then you'd better allow three more digits, or else you'll
be limited to \077 = ASCII 63!  So the {0,2} in the original example
should be {0,3}.  Better is to allow the constant to start with either
a 0 or a 1, or (better still) to allow it to be any number and do explicit
checking that it's within a valid range (after all, C allows '\56' as
an octal character constant).

>[This is a pretty gross way to define a character constant, but I don't
>see anything obviously wrong with it, other than that it matches things
>like '\000' ambiguously. -John]

Yes, because of the '+' operator applied to the entire interior of the
character constant and the [^'\n] pattern, the original definition is
identical to

	char \'([^'\n]|\\['\n])+\'

anyway.  Neither of these definitions is right, though: they will match

	'\\''

since the first \ inside the character constant will match the [^'\n]
pattern and then the \' sequence will match \\['\n].  To force only legal
escape sequences to be recognized, something like

	char \'([^'\n\\]|\\[ntrbrf'\n\\]|\\0[0-7]{0,3})+\'

is needed.  Forcing the scanner to only match correctly formed character
constants is often a mistake, though, since it makes detection and
reporting of illegal constants more difficult.  One alternate way to
tackle this problem is to match character constants one character
at a time, using start conditions.  Something like:

	%x chcon

	%%
			char charbuf[MAX_CHAR_CONST];
			char *charbuf_ptr;


	'		charbuf_ptr = charbuf; BEGIN(chcon);

	<chcon>'	{ /* saw closing quote - all done */
			BEGIN(INITIAL);
			*charbuf_ptr = '\0';
			/* return character constant token type and
			 * value to parser
			 */
			}

	<chcon>\n	{
			/* error - unterminated character constant */
			/* generate error message */
			}

	<chcon>\\[0-7]{1,3}	{	/* octal escape sequence */
			int result;

			(void) sscanf( yytext + 1, "%o", &result );

			if ( result > 0xff )
				/* error, constant is out-of-bounds */

			*charbuf_ptr++ = result;
			}

	<chcon>\\[0-9]+	{
			/* generate error - bad escape sequence; something
			 * like '\48' or '\0777777'
			 */
			}

	<chcon>\\n	*charbuf_ptr++ = '\n';
	<chcon>\\t	*charbuf_ptr++ = '\t';
	<chcon>\\r	*charbuf_ptr++ = '\r';
	<chcon>\\b	*charbuf_ptr++ = '\b';
	<chcon>\\f	*charbuf_ptr++ = '\f';

	<chcon>\\(.|\n)	*charbuf_ptr++ = yytext[1];

	<chcon>.	*charbuf_ptr++ = yytext[0];

		Vern

	Vern Paxson			      vern@cs.cornell.edu
	Computer Science Dept.		      decvax!cornell!vern
	Cornell University
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.