ian@ux.cs.man.ac.uk (Ian Cottam) (03/20/89)
I usually use a lex script for such things. I didn't have one for C, but the following might do the trick. N.B. Not tested, not proven, no warranty! ________________________________________________________________________ %{ /***** Lex script to strip comments from C texts ******/ %} %s COMMENT STRING CHAR %% <INITIAL>\' {BEGIN CHAR; ECHO;} <INITIAL>\" {BEGIN STRING; ECHO;} <INITIAL>"/*" BEGIN COMMENT; <INITIAL>. ECHO; <INITIAL>\n ECHO; <CHAR>\\' ECHO; <CHAR>\' {ECHO; BEGIN INITIAL;} <STRING>\\\" ECHO; <STRING>\" {ECHO; BEGIN INITIAL;} <COMMENT>"*/" BEGIN INITIAL; <COMMENT>. ; <COMMENT>\n ; %% ----------------------------------------------------------------- Ian Cottam, Room IT101, Department of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, U.K. Tel: (+44) 61-275 6157 FAX: (+44) 61-275-6280 ARPA: ian%ux.cs.man.ac.uk@nss.cs.ucl.ac.uk JANET: ian@uk.ac.man.cs.ux UUCP: ..!mcvax!ukc!mur7!ian -----------------------------------------------------------------
leo@philmds.UUCP (Leo de Wit) (03/21/89)
In article <5693@ux.cs.man.ac.uk> ian@ux.cs.man.ac.uk (Ian Cottam) writes: | |I usually use a lex script for such things. I didn't have one for |C, but the following might do the trick. N.B. Not tested, not proven, |no warranty! [lex script omitted] This will cover most ordinary cases; but not this one: (startcom.h is either empty or contains a /* ). main() { puts("Testing 1"); #include "startcom.h" puts("Testing 2"); /* puts("Testing 3"); */ } The second puts should get commented out or not, depending on the contents of the header file. OK, I'll admit, it is a bit far-fetched 8-); it however proves once again that it isn't exactly trivial to do the general case right. Leo.
ian@ux.cs.man.ac.uk (Ian Cottam) (03/21/89)
In article <985@philmds.UUCP> leo@philmds.UUCP (Leo de Wit) writes: >In article <5693@ux.cs.man.ac.uk> ian@ux.cs.man.ac.uk (Ian Cottam) writes: >| >|I usually use a lex script for such things.... >This will cover most ordinary cases; but not this one: > >[an include file...] (startcom.h is either empty or contains a /* ). According to my understanding of the pANS C preprocessor and the notion of a "translation unit", your example is erroneous as the comment must be terminated within startcom.h. (As a practical man I also confirmed my suspicion with the help of gcc :-) ) However, I suspect someone can come up with a case that will throw my lex script as comment-strippers always require more thought than people can believe possible. ----------------------------------------------------------------- Ian Cottam, Room IT101, Department of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, U.K. Tel: (+44) 61-275 6157 FAX: (+44) 61-275-6280 ARPA: ian%ux.cs.man.ac.uk@nss.cs.ucl.ac.uk JANET: ian@uk.ac.man.cs.ux UUCP: ..!mcvax!ukc!mur7!ian -----------------------------------------------------------------
leo@philmds.UUCP (Leo de Wit) (03/23/89)
In article <5695@ux.cs.man.ac.uk> ian@mucs.UUCP (Ian Cottam) writes: |According to my understanding of the pANS C preprocessor and the |notion of a "translation unit", your example is erroneous as the |comment must be terminated within startcom.h. (As a practical |man I also confirmed my suspicion with the help of gcc :-) ) OK, if your stripper is pANS conforming (whatever that means) it should handle trigraphs too (it doesn't). If it isn't, then, being a practical man, try and test my sample program on an Ultrix 2.x C compiler (4.3 BSD will probably do too). It'll compile just fine. |However, I suspect someone can come up with a case that will |throw my lex script as comment-strippers always require more |thought than people can believe possible. That's exactly the point I'm trying to make all the time: it shouldn't be difficult, it isn't entirely trivial however (it may even depend on the version of pANS you're reading; look for instance for /* in include file names, compare the Feb '87 and May '88 drafts). Leo.