[comp.unix.programmer] Optimizing out unreferenced variables

davidk@dsinet (David Karr) (05/07/91)

I have been hearing about a tendency for certain Unix optimizing C compilers
to deal harshly with static variables declared in C modules that are not
referenced in that module.  In other words, it will delete those variables
from the object file.  A controversial example would be variables declared
in each module to hold RCS or SCCS information.  Often these variables will
be declared as static, and only used by certain utilities to parse out the
version numbers from an executable binary.

I was told that the C compiler on AIX has this particular "affliction".  Is
this a general feature of optimizing C compilers, and will more compilers
be adding this "feature" as time goes on, or is the AIX compiler a fluke?  I
heard a mention that the HP 700 compiler would be doing this in the future.
-- 
Digital Systems International, Inc.	David Karr
7730 177th Pl NE			dsinet!davidk
Redmond, WA   98073-0903
(206) 881-7544 ext. 547

clewis@ferret.ocunix.on.ca (Chris Lewis) (05/07/91)

In article <608@elroy> davidk@dsinet (David Karr) writes:
|I have been hearing about a tendency for certain Unix optimizing C compilers
|to deal harshly with static variables declared in C modules that are not
|referenced in that module.  In other words, it will delete those variables
|from the object file.  A controversial example would be variables declared
|in each module to hold RCS or SCCS information.  Often these variables will
|be declared as static, and only used by certain utilities to parse out the
|version numbers from an executable binary.

|I was told that the C compiler on AIX has this particular "affliction".  Is
|this a general feature of optimizing C compilers, and will more compilers
|be adding this "feature" as time goes on, or is the AIX compiler a fluke?  I
|heard a mention that the HP 700 compiler would be doing this in the future.

I've not tried this myself, but I would presume that if the compiler
is ANSI compliant, you can probably declare such variables "volatile"
and the compiler will not optimize them out.  Volatile does work in
similar situations with the RS/6000 compiler.
-- 
Chris Lewis, Phone: (613) 832-0541, Domain: clewis@ferret.ocunix.on.ca
UUCP: ...!cunews!latour!ecicrl!clewis; Ferret Mailing List:
ferret-request@eci386; Psroff (not Adobe Transcript) enquiries:
psroff-request@eci386 or Canada 416-832-0541.  Psroff 3.0 in c.s.u soon!

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (05/08/91)

In article <1475@ecicrl.ocunix.on.ca> clewis@ferret.ocunix.on.ca (Chris Lewis) writes:
> I've not tried this myself, but I would presume that if the compiler
> is ANSI compliant, you can probably declare such variables "volatile"
> and the compiler will not optimize them out.  Volatile does work in
> similar situations with the RS/6000 compiler.

It's a whole bunch more portable and useful to give the user some
options to print out those strings. End of story.

---Dan

rob@array.UUCP (Rob Marchand) (05/08/91)

In article <608@elroy> davidk@dsinet (David Karr) writes:
|I have been hearing about a tendency for certain Unix optimizing C compilers
|to deal harshly with static variables declared in C modules that are not
|referenced in that module.  In other words, it will delete those variables
|from the object file.  A controversial example would be variables declared
|in each module to hold RCS or SCCS information.  Often these variables will
|be declared as static, and only used by certain utilities to parse out the
|version numbers from an executable binary.

	I have seen this as well.

|I was told that the C compiler on AIX has this particular "affliction".  Is
|this a general feature of optimizing C compilers, and will more compilers
|be adding this "feature" as time goes on, or is the AIX compiler a fluke?  I
|heard a mention that the HP 700 compiler would be doing this in the future.

	I believe that an older version of the VMS C compiler would 
	optimize out static variables that were not used.  The case with RCS
	that you mention is in fact the way I found this out.  I don't
	know about newer versions of the compiler...
	More an inconvenience than anything, but in some cases it is
	sure handy to be able to identify the contents of the binary
	with certainty.   I guess the question is whether you also patch
	the RCS $id$ string when you do binary patches to the file :-)
	Kidding!  Just kidding!

	Cheers!
	Rob Marchand
-- 
Rob Marchand                   UUCP  : uunet!attcan!lsuc!array!rob
Array Systems Computing        ARPA  : rob%array.UUCP@uunet.UU.NET
401 Magnetic Drive, Unit 24    Phone : +1(416)736-0900   Fax: (416)736-4715
Downsview, Ont CANADA M3J 3H9  Telex : 063666 (CNCP EOS TOR) .TO 21:ARY001

clewis@ferret.ocunix.on.ca (Chris Lewis) (05/09/91)

In article <13206:May719:03:1491@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>In article <1475@ecicrl.ocunix.on.ca> clewis@ferret.ocunix.on.ca (Chris Lewis) writes:
>> I've not tried this myself, but I would presume that if the compiler
>> is ANSI compliant, you can probably declare such variables "volatile"
>> and the compiler will not optimize them out.  Volatile does work in
>> similar situations with the RS/6000 compiler.

>It's a whole bunch more portable and useful to give the user some
>options to print out those strings. End of story.

What exactly do you mean by this?  what(1)?  (how do you get the strings
to stay in the binary without volatile in such compilers?)  A -V option?
(ain't adequate - software frequently has more than one version number -
the version of each submodule is also important in bug tracking in big
systems.  Or for identifying version numbers of the library routines you
linked with)

It's a little more general problem than just SCCS or RCS idents.  Such
as operations on hardware devices being reordered or being optimized out
in device drivers.  But you already knew that.

SVR3's "#ident" stuff is nice, but unfortunately rather unportable (the string
isn't in .text or .data - and isn't part of the executable image).  SCO's
damn Xenix C compiler kacks on cpp directives it doesn't like even if
ifdef'd out:
	#ifdef	NEVER
	#ident ....
	#endif
Grrrrr.....

This is more portable than "#ident":

#ifdef	__STDC__
#define	VOLATILE volatile
#else
#define	VOLATILE
#endif

#ifndef	lint
VOLATILE static char SCCSID[] = "@(#)....";
#endif

("VOLATILE static" does seem like an oxymoron doesn't it? ;-) [No,
the semantics are not in conflict])
-- 
Chris Lewis, Phone: (613) 832-0541, Domain: clewis@ferret.ocunix.on.ca
UUCP: ...!cunews!latour!ecicrl!clewis; Ferret Mailing List:
ferret-request@eci386; Psroff (not Adobe Transcript) enquiries:
psroff-request@eci386 or Canada 416-832-0541.  Psroff 3.0 in c.s.u soon!

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (05/09/91)

In article <1477@ecicrl.ocunix.on.ca> clewis@ferret.ocunix.on.ca (Chris Lewis) writes:
> >It's a whole bunch more portable and useful to give the user some
> >options to print out those strings. End of story.
> What exactly do you mean by this?  what(1)?

No. what(1) is not supported by the language or the complete programming
environment, so it is inherently unreliable.

> A -V option?

All that's important for this problem is that each library provide a
routine that returns a version string.

If you want this to work recursively, do something like void foovers(f)
int (*f)(). foovers calls f with its version string as an argument. If f
returns 1, foovers calls the barvers routines upon f for each library
bar that it uses. Otherwise foovers just returns.

Then you can accomplish different results by plugging in different f's.
If you don't want to recurse, for instance, you just supply a function
that prints its argument and returns 0. If you want to print strings
like what(1), you supply a function that stores its argument in a hash
table and returns 1 if the argument hasn't been seen before. Then you
print all the strings in the table. If you want to trace package
dependencies, you can do that too.

---Dan

davidc@vlsisj.uucp (David Chapman) (05/11/91)

In article <8874:May914:42:5791@kramden.acf.nyu.edu>,
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
|> In article <1477@ecicrl.ocunix.on.ca> clewis@ferret.ocunix.on.ca
(Chris Lewis) writes:
|> > >It's a whole bunch more portable and useful to give the user some
|> > >options to print out those strings. End of story.
|> > What exactly do you mean by this?  what(1)?
|> 
|> No. what(1) is not supported by the language or the complete programming
|> environment, so it is inherently unreliable.
|> 
|> > A -V option?
|> 
|> All that's important for this problem is that each library provide a
|> routine that returns a version string.

Try this:  write in C++ and define a class whose constructors link themselves
into one list.  It should have a member function that can traverse the list 
printing the version string which is the argument to the constructor.  You put 
one of these version objects into each source file (outside any functions) 
with its version number.  Then your "version" function simply invokes this 
printing function on its own version variable, and voila! - all your version 
strings get dumped to the screen or a convenient file.  You don't need to 
know the variable names, and the system extends itself every time you add 
another file.

Here's an example that I just whipped up.  I haven't tried to compile or
run it yet, so use it at your own risk.  :-)  It's free for the taking.

/************************** declaration file *************************/

class fileversion {
    public:
        fileversion(const char *ver);
        void printversions(void);  /* extend as you see fit */
    private:
        char *_ver;
        static fileversion *firstver;
        fileversion *nextver;
};

/************************* implementation file ***********************/

/* list terminator */
fileversion *fileversion::firstver = 0;

fileversion::fileversion(const char *ver)
{
    _ver = ver;                 /* save file version */
    nextver = firstver;         /* link ourselves in */
    firstver = this;
}

void fileversion::printversions(void)
{
    fileversion *cur;           /* could be fancier, obviously */

    for (cur = firstver; cur != 0; cur = cur->nextver)
        printf("%s\n",cur->_ver);  /* works since this is a member function */
}

/*************************** sample usage ***************************/

/* "static" in declaration optional and irrelevant */
static fileversion anyvariablename("file foo.cpp 3.1.2 10-May-91");

void printallversions(void)
{
    anyvariablename.printversions();
}

/*************************** end of example ************************/

The list is self-constructing; all of the constructors are called via
the linker so you don't need to do it yourself!  This is also why you
don't need to know the names of the variables.  And the optimizer is
guaranteed not to optimize it away!

It's probably not a good idea to inline the constructor.  I found this
out the hard way.  Zortech C++ constructors seem to be a minumum of
200 bytes, and on a DOS machine memory is tight.  :-(

Thanks for the idea, Dan!

                David Chapman

{known world}!decwrl!vlsisj!davidc
vlsisj!davidc@decwrl.dec.com