[net.sources.bugs] C grammar

aeb@turing.UUCP (02/12/85)

Having the personality of a kumquat and wanting to befuddle my friends
I cranked some of my C programs through the recently posted parser.
Unfortunately it complained about correct fragments like

    ...
    if ((*buf == ':') || (('a' <= *buf) && ('z' >= *buf)))
    ...

because it parsed all of ':') || ..... ('z' as one single character constant.
Also the handling of backslashes in strings was not quite correct.

The diff below shows how one might change scan.l in order to improve
the parser's behaviour. I am not sure about the + . Are multi-character
character constants permitted by the new standard? (I hope not.)

*** scan.l.orig	Tue Feb 12 11:38:07 1985
--- scan.l	Tue Feb 12 17:16:09 1985
***************
*** 56,62
  0{D}+{US}?{LS}?		{ count(); return(CONSTANT); }
  {D}+{LS}?{US}?		{ count(); return(CONSTANT); }
  {D}+{US}?{LS}?		{ count(); return(CONSTANT); }
! '.*'			{ count(); return(CONSTANT); }
  
  {D}+{E}{LS}?		{ count(); return(CONSTANT); }
  {D}*"."{D}+({E})?{LS}?	{ count(); return(CONSTANT); }

--- 56,62 -----
  0{D}+{US}?{LS}?		{ count(); return(CONSTANT); }
  {D}+{LS}?{US}?		{ count(); return(CONSTANT); }
  {D}+{US}?{LS}?		{ count(); return(CONSTANT); }
! '(\\.|[^\\'])+'		{ count(); return(CONSTANT); }
  
  {D}+{E}{LS}?		{ count(); return(CONSTANT); }
  {D}*"."{D}+({E})?{LS}?	{ count(); return(CONSTANT); }
***************
*** 62,68
  {D}*"."{D}+({E})?{LS}?	{ count(); return(CONSTANT); }
  {D}+"."{D}*({E})?{LS}?	{ count(); return(CONSTANT); }
  
! \"(\\\"|[^"])*\"	{ count(); return(STRING_LITERAL); }
  
  ">>="			{ count(); return(RIGHT_ASSIGN); }
  "<<="			{ count(); return(LEFT_ASSIGN); }

--- 62,68 -----
  {D}*"."{D}+({E})?{LS}?	{ count(); return(CONSTANT); }
  {D}+"."{D}*({E})?{LS}?	{ count(); return(CONSTANT); }
  
! \"(\\.|[^\\"])*\"	{ count(); return(STRING_LITERAL); }
  
  ">>="			{ count(); return(RIGHT_ASSIGN); }
  "<<="			{ count(); return(LEFT_ASSIGN); }
-- 
      Andries Brouwer -- CWI, Amsterdam -- {philabs,decvax}!mcvax!aeb

jeff@gatech.UUCP (Jeff Lee) (02/19/85)

> The diff below shows how one might change scan.l in order to improve
> the parser's behaviour. I am not sure about the + . Are multi-character
> character constants permitted by the new standard? (I hope not.)

Yes. In the new standard, multicharacter character constants are allowed
but (of course) are implementation defined. A character constant is an INT
so you can guess what most implementations will do.

-- 
Jeff Lee
CSNet:	Jeff @ GATech		ARPA:	Jeff.GATech @ CSNet-Relay
uucp:	...!{akgua,allegra,rlgvax,sb1,unmvax,ulysses,ut-sally}!gatech!jeff

henry@utzoo.UUCP (Henry Spencer) (02/21/85)

> Yes. In the new standard, multicharacter character constants are allowed
> but (of course) are implementation defined. A character constant is an INT
> so you can guess what most implementations will do.

Already do, you mean.  There's nothing new about multi-character character
constants; they've been in C all along, with the same warnings about them
being highly implementation-dependent.  I'm a bit disappointed that the
ANSI committee didn't delete them.  Oh well.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry