[comp.unix.programmer] Give me your strings.

cl@lgc.com (Cameron Laird) (02/12/91)

I want to be able to type

	show_me_source_strings source1.c source2.c

and receive something like

	source1.c:	"This is a string in the C-source source1.c"
	source1.c:	"I'm a string too, passed to the function %s.\n""
	source1.c:	"I initialize a char *.";
	source2.c:	"Me too; I'm in the string-space of this program."
	source2.c:	"There's another string in this file, but it's in comments."

Is the point clear?  I'm looking for an executable that knows enough C
(or Pascal, ...) syntax to isolate string constants, and echo them out
to a file (possibly stdout).

It's easy enough to write a grep or sed or grep script that finds all
lines with a couple of "-s in them, but I'm curious whether there is a
Better (more accurate, powerful, ...) Way.  Is there a standard, modern,
low-cost fashion for getting at the syntactic elements of C source?  If
I became adept at YACC, could I code this up in two minutes?  Is there a
public-domain C parser that everyone uses to construct filters such as I
have in mind?  Has Harry Spencer written an awk program that does this,
or will emacs give it to me if I type CTL-\-ESC-ALT-&-F7-...?
--

Cameron Laird		USA 713-579-4613
cl@lgc.com		USA 713-996-8546 
[What you want to do is lexical analysis, and that's what lex and flex do.
Many C lexers are floating around the net; see the comp.compilers monthly
posting for some suggestions.  By the way, his name is Henry. -John]
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

tchrist@convex.COM (Tom Christiansen) (02/13/91)

From the keyboard of cl@lgc.com (Cameron Laird):
:I want to be able to type
:
:	show_me_source_strings source1.c source2.c
:
:and receive something like
:
:	source1.c:	"This is a string in the C-source source1.c"
:	source1.c:	"I'm a string too, passed to the function %s.\n""
:	source1.c:	"I initialize a char *.";
:	source2.c:	"Me too; I'm in the string-space of this program."
:	source2.c:	"There's another string in this file, but it's in comments."
:
:Is the point clear?  I'm looking for an executable that knows enough C
:(or Pascal, ...) syntax to isolate string constants, and echo them out
:to a file (possibly stdout).

The cxref program will do this for you.  You could also use xstr.  You'll
have to postprocess each of these to get precisely the output you
specified.  xstr is available on uunet:bsd-sources/pgrm/xstr/* via anon
ftp.  I would use xstr myself.

Perl solutions available upon demand. :-)

--tom
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

hoffman@nunki.crd.ge.com (William A. Hoffman) (02/14/91)

:...  I'm looking for an executable that knows enough C
:(or Pascal, ...) syntax to isolate string constants, and echo them out
:to a file (possibly stdout).


What about a simple lex program: string.lex

--------------------------------------------------------
string       \"([^"\n]|\\["\n])*\"
%%
{string}	printf("%s\n", yytext); return(1);
\n		;
.		;
%%
main()
	{
	int i;

	while(i= yylex())
		;
	}

yywrap()
	{
	}
------------------------------------------------------------
to run just:
lex string.lex
cc lex.yy.c -o string
string < *.c
[I'd make it a little bit smarter to handle character constants and
comments, but in general that's the right idea. -John]
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.