ray@emacs.uucp (Ray Reeves) (06/25/85)
I'm new to lex, and the first thing I tried wouldn't fly! How do you recognise a PL/1 style comment? This is what I did: startcom \/\* endcom \*\/ %% {startcom}[^{endcom}]{endcom} printf("%s","comment") But negation doesn't take conjunctions, even when packaged like this. What should I do? -- Ray Reeves, CCA-UNIWORKS,20 William St,Wellesley, Ma. 02181. (617)235-2600 emacs!ray@CCA-UNIX
sambo@ukma.UUCP (Inventor of micro-S) (06/29/85)
In article <114@emacs.uucp> ray@emacs.uucp (Ray Reeves) writes: >This is what I did: > >startcom \/\* >endcom \*\/ >%% >{startcom}[^{endcom}]{endcom} printf("%s","comment") Here is what I do. If there is a better way, let me know. %% "/*"[^*\n]* { int c, i; if ((c = input ()) == '*') if ((c = input ()) == '/') { /* Have found a comment. */ } /* if ((c = input ()) == '/') */ else { unput (c); unput ('*'); unput ('/'); /* This makes lex think that the very next thing on the input is also a comment. */ } /* else - if ((c = input ()) == '/') */ else { /* found '\n' */ unput ('*'); unput ('/'); /* Processing may be different upon reaching the end of line - in my compiler I keep track of which line of text I am looking at. Also, by doing things this way I am hoping to avoid overflowing the input buffer. */ } /* else - first if ((c */ } ----------------------------------------- Samuel A. Figueroa, Dept. of CS, Univ. of KY, Lexington, KY 40506-0027 ARPA: ukma!sambo<@ANL-MCS>, or sambo%ukma.uucp@anl-mcs.arpa, or even anlams!ukma!sambo@ucbvax.arpa UUCP: {ucbvax,unmvax,boulder,oddjob}!anlams!ukma!sambo, or cbosgd!ukma!sambo "Micro-S is great, if only people would start using it."
haahr@jendeh.UUCP (Paul Haahr) (07/01/85)
> How do you recognise a PL/1 style comment?
"/*"([^*]|"*"[^/])*"*/"
Paul Haahr
..!allegra!princeton!jendeh!haahr
dudek@harvard.ARPA (Glen Dudek) (07/04/85)
What I consider the "best" (read "most efficient") way to eat C or PL-1 style comments in lex is as in the ANSI-C standard yacc/lex grammar recently posted to the net: %% "/*" { comment(); printf("comment"); } %% comment() { char c, c1; loop: while ((c = input()) != '*' && c != 0) putchar(c); if ((c1 = input()) != '/' && c != 0) { unput(c1); goto loop; } if (c != 0) putchar(c1); } My favorite way is to use the following lex expression: %% "/*"("/"|("*"*[^*/]))*"*"+"/" { printf("comment"); } %% Although it may make your head hurt, it's interesting to figure out. Glen Dudek
ray@emacs.uucp (Ray Reeves) (07/11/85)
Thanks for all the contributions on this subject. Many people recommended embedding C in Lex to solve my problem, pointing out some precedents for this. This, of course, is tantamount to saying that Lex can't hack it, and indeed amdahl!drivax!alan says I shouldn't expect a finite state machine to do so. Paul Haahr of Princeton made a snappy answer which was: "/*"([^*]|"*"[^/]*"*/" but Glen Dudek of Harvard pointed out that this fails for /***/, and it should have been: "/*"("/"|("*"*[^*/]))*"*"+"/" Several people pointed out the hazard of enormous block comments, and McQueer steered me to the use of START transitions, which I decided was sound advice when I discovered the yymore function. I append the Lex code that I have arrived at, a program to which you all contributed. My special problem is a lexical processor for a PL/1 pretty-printer, and in this environment people typically have enormous comment blocks with some sort of pattern or table in them. Thus, although code can be torn to shreds and reformatted block comments must be left undisturbed. The problem of large blocks is solved by entering a "comment" mode and tokenising each line separately. The residual problem is that the first line of such a block has to respect leading white space even before the comment starts. This is solved by tokenising the whole line, not just the comment part. Elsewhere, white space is discarded. To my astonishment, a fault in Lex showed up which almost crippled me. It is impossible to recognise just one \n character under a START mode, although you can in normal mode. Thus, my last rule looks for [\n]+ followed by any character and then unputs that character back. Is this a known wart? startcom \/\* endcom \*\/ %START com maybecom %% {startcom} {yymore();BEGIN com;} [\n]+ {printf("%s%u%c","nl(",yyleng,')');BEGIN maybecom;} [ \t]+ ; <maybecom>[\ \t]*{startcom} {yymore();BEGIN com;} <com>[^\*\n]*{endcom} {printf("%s%u%s%s%s","cm(",yyleng,",\"",yytext,"\")");BEGIN 0;} <com>[^\*\n]* printf("%s%u%s%s%s","cm(",yyleng,",\"",yytext,"\")"); <com>[^\*\n]*\* yymore(); <com>[\n]+. {unput(yytext[yyleng-1]);printf("%s%u%c","nl(",yyleng-1,')');} %% main() {while (2) yylex();} -- Ray Reeves, CCA-UNIWORKS,20 William St,Wellesley, Ma. 02181. (617)235-2600 emacs!ray@CCA-UNIX
andrew@orca.UUCP (Andrew Klossner) (07/14/85)
>> How do you recognise a PL/1 style comment? > > "/*"([^*]|"*"[^/])*"*/" Two problems: 1) This pattern will incorrectly recognize "/***/ */" as a comment. 2) This approach to comment skipping is a bad idea in lex, because the generated lexer will try to accumulate the entire comment in the "yytext" buffer, which has a fixed size. (On our system, the size is 1024 bytes.) If ever a comment with more bytes than the buffer size is found, the lex driver will merrily overwrite the memory following the buffer and blow away your compile. -=- Andrew Klossner (decvax!tektronix!orca!andrew) [UUCP] (orca!andrew.tektronix@csnet-relay) [ARPA]
hammond@steinmetz.UUCP (Steve Hammond) (07/23/85)
> >> How do you recognise a PL/1 style comment? > > > > "/*"([^*]|"*"[^/])*"*/" > > Two problems: > > 1) This pattern will incorrectly recognize "/***/ */" as a comment. > > 2) This approach to comment skipping is a bad idea in lex, because the > generated lexer will try to accumulate the entire comment in the > "yytext" buffer, which has a fixed size. (On our system, the size > is 1024 bytes.) If ever a comment with more bytes than the buffer > size is found, the lex driver will merrily overwrite the memory > following the buffer and blow away your compile. > I get around #1 it by switching modes when I encounter a /* and switching back when I encounter a */. To date I have not overflowed the yytext buffer (I hope). -- Steve Hammond arpa: hammond@GE uucp: {...edison!}steinmetz!hammond