[comp.lang.c] How Does 'C' Store Strings ?

childers@avsd.UUCP (Richard Childers) (10/12/89)

I'm trying to create a MS-DOS equivalent of the UNIX utility 'strings' -
ultimately part of a public-domain package of UNIX utilities that will
compile under both MSDOS and UNIX - and I've run into a problem.

Basically, no matter how I look at the resulting executable - whether
with my 'strings', or a quick-and-dirty 'od' - I can't find the ASCII
characters corresponding to the strings I thought I had compiled into
the executable.

As far as I know, in UNIX, char is stored as individual allocated bytes,
perfectly accessible, perfectly in accord with ASCII specifications.

I've tried explicitly defining char arrays, IE

	#define	vers[CMDBUFSIZ] =	"v1.00 891010 richard childers" ;

... as well as trying to find strings built into fprintf() calls, to no
avail. What am I missing ?

One of the possibilities I've considered includes the fact that, while I've
defined this array, I've never referenced it, and thus the compiler might
have decided to optomize it out of existence. Another possibility is that
strings found in printf() or fprintf() are compressed. If so, I haven't
seen any reference to how this might be turned off, although there are three
manuals to peruse. Am I going to have to write a decompression algorithm ?
That's going to have to be applied to every byte ? And if I want to bury ID
strings in my code, am I going to have to initialize strings on a byte-by-byte
basis ? Ay, caramba !!

I'm using MicroSoft C v4.00 on a Wyse PC with about 128 KB on board ...

-- richard

-- 
 *	A CITIZEN:   "Who might you be ? Samson ? --"                         *
 *	CYRANO:      "Precisely. Would you kindly lend me your jawbone ?"     *
 *                    from _Cyrano de Bergerac_, by Edmond Rostand            *
 *        ..{amdahl|decwrl|octopus|pyramid|ucbvax}!avsd.UUCP!childers         *

cpcahil@virtech.UUCP (Conor P. Cahill) (10/12/89)

In article <2141@avsd.UUCP>, childers@avsd.UUCP (Richard Childers) writes:

> Basically, no matter how I look at the resulting executable - whether
> with my 'strings', or a quick-and-dirty 'od' - I can't find the ASCII
> characters corresponding to the strings I thought I had compiled into
> the executable.

How are you opening the input file?  In MSC you must specify something
like O_BINARY in order to read a complete non-text file.  The strings
are stored the same way in MSC as they are stored in UNIX executables -
non-compressed sequences of characters followed by a null byte.

Long ago I wrote a strings for dos which worked correctly under MSC 3.0.
I don't know where it is now.

> 	#define	vers[CMDBUFSIZ] =	"v1.00 891010 richard childers" ;

What is this supposed to do?  First of all the only way to use it is to
have a global variable as follows:

	char string_you_want vers;

which is totally unreadable.  What were you attempting to do?  There is
nothing that you can do through the preprocessor that you couldn't do
directly in the code.

> One of the possibilities I've considered includes the fact that, while I've
> defined this array, I've never referenced it, and thus the compiler might
> have decided to optomize it out of existence.

If this was true, all of the sccsid and rcsid strings would never
appear in the object files (which they do).



-- 
+-----------------------------------------------------------------------+
| Conor P. Cahill     uunet!virtech!cpcahil      	703-430-9247	!
| Virtual Technologies Inc.,    P. O. Box 876,   Sterling, VA 22170     |
+-----------------------------------------------------------------------+

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (10/12/89)

In article <2141@avsd.UUCP>, childers@avsd.UUCP (Richard Childers) writes:

|  As far as I know, in UNIX, char is stored as individual allocated bytes,
|  perfectly accessible, perfectly in accord with ASCII specifications.
|  
|  I've tried explicitly defining char arrays, IE
|  
|  	#define	vers[CMDBUFSIZ] =	"v1.00 891010 richard childers" ;
|  
|  ... as well as trying to find strings built into fprintf() calls, to no
|  avail. What am I missing ?

  You may have two problems here. One is that something defined to the
preprocessor via #define never makes it into the program unless you use
it. One way to define your string is to do something like:

	char *my_id = "The string you want, like copyright";

  I make mine static, but I think it would be legal for a compiler to
optimize out an unreferenced static.

  If you can't find strings which are formats of printfs you may have a
broken "strings." I have used the one I have and it works for MSC and TC
at least.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
"The world is filled with fools. They blindly follow their so-called
'reason' in the face of the church and common sense. Any fool can see
that the world is flat!" - anon

childers@avsd.UUCP (Richard Childers) (10/14/89)

I recently said ...

>I've tried explicitly defining char arrays, IE
>
>	#define	vers[CMDBUFSIZ] =	"v1.00 891010 richard childers" ;

I actually meant to say ...

	char vers[CMDBUFSIZ] =		"v1.00 891010 richard childers" ;

... which changes the problem somewhat.

A wide variety of people have replied, and, much to my surprise, nobody
felt it necessary to call me 'hosehead' or tell me to go to a different
newsgroup, such as alt.msdos.programmer, for which I am thankful.

The best help I've gotten to date suggested that I use 'static' storage
classes for SCCS-type buried ID strings, and another individual at UC
Santa Cruz suggested I try opening the file using "binary" mode, which
doesn't seem to be documented in my version of MSC.

One possibility that's occurred to me is that, in a PC environment, the
designers of the compiler might have decided that string compression was
a win, much as ( according to many contributors ) Lattice' compiler tries
to identify and eliminate redundant strings from the resulting image,
given the significant decrease in space available in an MS-DOS environ-
-ment.

I've been informed that if this is true, it would be useful information to
know, and I'll keep everyone posted on what I find out ...

-- richard


-- 
 *	A CITIZEN:   "Who might you be ? Samson ? --"                         *
 *	CYRANO:      "Precisely. Would you kindly lend me your jawbone ?"     *
 *                    from _Cyrano de Bergerac_, by Edmond Rostand            *
 *        ..{amdahl|decwrl|octopus|pyramid|ucbvax}!avsd.UUCP!childers         *

mustard@sdrc.UUCP (Sandy Mustard) (10/17/89)

In article <2157@avsd.UUCP>, childers@avsd.UUCP (Richard Childers) writes:
> I recently said ...
> 
> 	char vers[CMDBUFSIZ] =		"v1.00 891010 richard childers" ;
> 
> 
> The best help I've gotten to date suggested that I use 'static' storage
> classes for SCCS-type buried ID strings
> 
> One possibility that's occurred to me is that, in a PC environment, the
> designers of the compiler might have decided that string compression was
> a win, much as ( according to many contributors ) Lattice' compiler tries
> to identify and eliminate redundant strings from the resulting image,
> given the significant decrease in space available in an MS-DOS environ-
> -ment.

You may also want to use 

static const char vers.....
       ^^^^^
This may help avoid the redundant string elimination. 

Would not the following be true.

static char string1[] = "ABCD";
static char string2[] = "ABCD";

The compiler could eliminate the redundant strings (when appropriate)

whereas:

static const char string1[] = "ABCD";
static char string2[] = "ABCD";

should force the compiler to store two separate strings. 

(I hope someone will correct me if I'm wrong.:-))

Sandy

cpcahil@virtech.UUCP (Conor P. Cahill) (10/18/89)

In article <914@sdrc.UUCP>, mustard@sdrc.UUCP (Sandy Mustard) writes:
> You may also want to use 
> 
> static const char vers.....
>        ^^^^^
> This may help avoid the redundant string elimination. 

The const should be a giant flag to the compiler that this data is 
the perfect choice for redundant data elimination since it won't be 
changed.

-- 
+-----------------------------------------------------------------------+
| Conor P. Cahill     uunet!virtech!cpcahil      	703-430-9247	!
| Virtual Technologies Inc.,    P. O. Box 876,   Sterling, VA 22170     |
+-----------------------------------------------------------------------+