[comp.sources.d] Lex, white space, and my 'check' program

nick@ccicpg.UUCP (Nick Crossley) (05/13/88)
I just found another bug in the lex part of my C checker program,
posted to the net a month or so back.  It might well affect other
programs as well.

I had a rule in the lex file for skipping white space :-

[ \t\n]+	;

(That was later enhanced to ignore \f and \r by request).

The problem is that this will break lex (overflowing the yytext buffer)
if the white space to be ignored is long enough.  Well, who writes
several hundred consecutive spaces or newlines in a C program, you ask;
unfortunately, the preprocessor can and does, when it is processing a
header file full of #defines and comments.  The symptom is random results;
in the case of my check program, it started reporting random syntax errors
after I added a couple more #defines to a header file.

The quick fix is simple; remove the '+' from the rule, and make lex ignore
each white space character independently.  I have not done any timings,
but this is presumably less efficient than the rule with the '+'.  If so,
a better fix would be to have a rule of the form :-

[ \t\f\r\n]	{ skipspace(); }

where skipspace just reads and ignores characters until a significant
character appears (and then pushes that character back, obviously).

In retrospect, this bug is obvious, and analogous to reading comments and
strings in lex patterns: you don't.  However, the pattern seems so simple,
and the likelihood of very long whitespace sequences seems low, so it did
not occur to me when I wrote the lex rule.

-- 

<<< standard disclaimers >>>
Nick Crossley, CCI, 9801 Muirlands, Irvine, CA 92718-2521, USA
Tel. (714) 458-7282,  uucp: ...!uunet!ccicpg!nick