jrv@siemens.UUCP (James R Vallino) (03/09/89)
We're working with this lousy compiler which is choking on files which have too many comments. Does anyone have a sed or awk script which we can use to preprocess the C source and get rid of all the comments before sending it to the compiler? Thanks! -- Jim Vallino Siemens Corporate Research, Princeton, NJ jrv@siemens.com princeton!siemens!jrv (609) 734-3331
mnc@m10ux.UUCP (Michael Condict) (03/13/89)
I recently posted to this group a shell script that calls three sed scripts to extract function prototypes from (practically) any C source. One of the three sed scripts consisted of little more than comment removal -- exactly what you are looking for. Here is the relevant portion: -------------------------------------------------------------------------- # Delete comments: : delcom /\/\*/{ # Change first comment delim to @ (after eliminating existing @'s): s/@/<Used#to%be+an-At>/g s:/\*:@: # Read until we have the end comment: : morecm /\*\//!{ # Just to cut down on max buffer length: s/@.*/@/ N b morecm } # Get rid of any $'s: s/\$/<Used#to%be+a-Dollar>/g # First occurrence of */ is guaranteed to be the corresponding end # comment, because it is otherwise not legal C, so: s:\*/:$: s/@[^$]*\$/ / # Restore $'s and @'s: s/<Used#to%be+a-Dollar>/$/g s/<Used#to%be+an-At>/@/g b delcom } ------------------------------------------------------------------------------ The disclaimers are that (1) it only works with BSD-derived sed, unless you get rid of all the comments; and (2) it will fail for programs that contain the extremely unlikely "Used#to%be..." strings used as markers in the script. This has been tested on thousands of lines of source code from various sources, but no guarantees. You get what you pay for. Mike Condict -- Michael Condict {att|allegra}!m10ux!mnc AT&T Bell Labs (201)582-5911 MH 3B-416 Murray Hill, NJ
hollombe@ttidca.TTI.COM (The Polymath) (03/16/89)
In article <880@m10ux.UUCP> mnc@m10ux.UUCP (Michael Condict) writes: }I recently posted to this group a shell script that calls three sed scripts }to extract function prototypes from (practically) any C source. One of the }three sed scripts consisted of little more than comment removal -- exactly }what you are looking for. Here is the relevant portion: } }[...] }The disclaimers are that (1) it only works with BSD-derived sed, unless you }get rid of all the comments; and (2) it will fail for programs that contain }the extremely unlikely "Used#to%be..." strings used as markers in the script. If I understood the original posting correctly, it will also fail if it encounters a /* or */ within a quoted string constant. E.g.: char *msg1 = "The symbol \"/*\" begins a comment in C. \n"; char *msg2 = "The symbol \"*\\\" ends a comment in C. \n"; I deliberately added the escaped double-quotes to show that true, safe comment detection and removal isn't a trivial problem. There are probably a number of other "special" cases that can cause a simple, scan-for-/*, scan-for-*/ algorithm to fail. }This has been tested on thousands of lines of source code from various sources, }but no guarantees. You get what you pay for. Sound advice. -- The Polymath (aka: Jerry Hollombe, hollombe@ttidca.tti.com) Illegitimati Nil Citicorp(+)TTI Carborundum 3100 Ocean Park Blvd. (213) 452-9191, x2483 Santa Monica, CA 90405 {csun|philabs|psivax}!ttidca!hollombe
cdold@starfish.Convergent.COM (Clarence Dold) (03/18/89)
From article <4060@ttidca.TTI.COM>, by hollombe@ttidca.TTI.COM (The Polymath): > In article <880@m10ux.UUCP> mnc@m10ux.UUCP (Michael Condict) writes: > }I recently posted to this group a shell script that calls three sed scripts > }to extract function prototypes from (practically) any C source. One of the > }three sed scripts consisted of little more than comment removal -- exactly > }what you are looking for. Here is the relevant portion: I managed to miss the original question, but none of the replies I've seen use the compiler to strip comments. From UNIX cpp target.c will strip off the comments. From Microsoft QuickC, QCL -E target.c will strip the comments. Since both cases are a part of a normal compile, they have to be 'double-escape/ comment in a comment'- proof. -- Clarence A Dold - cdold@starfish.Convergent.COM (408) 435-5293 ...pyramid!ctnews!starfish!cdold P.O.Box 6685, San Jose, CA 95150-6685
mnc@m10ux.UUCP (Michael Condict) (03/24/89)
In article <4060@ttidca.TTI.COM>, hollombe@ttidca.TTI.COM (The Polymath) writes: > In article <880@m10ux.UUCP> mnc@m10ux.UUCP (Michael Condict) writes: > }I recently posted to this group a shell script that > } [ deletes comments from C source, among other things ] > . . . > If I understood the original posting correctly, it will also fail if it > encounters a /* or */ within a quoted string constant. E.g.: > . . . Oops, you are absolutely right. After some analysis of this limitation in my sed script, it is obvious that the regular expressions of sed (or awk or vi/ex/ed) are too limited to handle the job in any reasonable fashion. Besides the lex script that does the job is trivial. Someone pointed out that they were posting a six-line lex script to comp.sources.unix. This doesn't seem like the best way to display the solution, since the article announcing the posting was itself longer than six lines. I'll throw out the following 3-line lex script, which has been tested on all the devious ways of forming comments and quotes that I can think of. In particular, it handles comment delimiters within quotes and quotes within comment delimiters: ----------- Lex script to delete comments from C source code ---------------- %% \"([^\\"]*\\(.|\n))*[^\\"]*\" ECHO; "/*"([^*]*"*"[^/])*[^*]*"*/" ; . ECHO; ----------------------------------------------------------------------------- Can anyone find anything wrong with this one (he asks stupidly)? Can anyone find a shorter solution? Boy this is almost as much fun as computing factorial in the minimum-sized C program. -- Michael Condict {att|allegra}!m10ux!mnc AT&T Bell Labs (201)582-5911 MH 3B-416 Murray Hill, NJ
tps@chem.ucsd.edu (Tom Stockfisch) (03/25/89)
In article <891@m10ux.UUCP> mnc@m10ux.UUCP (Michael Condict) writes: #----------- Lex script to delete comments from C source code ---------------- #%% #\"([^\\"]*\\(.|\n))*[^\\"]*\" ECHO; #"/*"([^*]*"*"[^/])*[^*]*"*/" ; #. ECHO; #Can anyone find anything wrong with this one (he asks stupidly)? Fails on /***/ -- || Tom Stockfisch, UCSD Chemistry tps@chem.ucsd.edu