[comp.lang.c++] Including header files minimally.

dld@f.gp.cs.cmu.edu (David Detlefs) (11/12/88)

There have been a number of interesting suggestions on how to go about
including header files only as needed.  Someone suggested including
them using something like an 

#include_once

directive.  It seems to me that the "include-once-ability" of a header
file ought to be a property of the file, independent of the context in
which is included.  This would favor putting something in the file
itself.  The "#ifndef/#define/<body>/#endif" style, while not pretty, is
certainly sufficient and requires no modifications to cpp.  Every so
often, someone writes an include file that is really meant to be
included multiple times -- it contains #ifdef's that depend an symbols
whose values are changed between inclusions.  Files of this kind could
not be include-once.  If we did want to modify cpp, though, we could
take care of both situations minimally by recording as we process the
inclusion of a file all the preprocessor symbols used in #ifdef's and
the values they had at the time of this inclusion.  We would store
this list in a table, and when we next included the file, check the
current values of those symbols.  If they are the same as before,
don't include the file, otherwise do.

It would be nice if you could precompile each header file to get a
symbol table (both preprocessor and language tables -- this scheme
would require an integrated preprocessor/compiler) that you would dump
to a file.  Maybe you would call this a ".hx" file.  Whenever you
include a .h file, check for a corresponding .hx file in the same
directory.  If you find it, look at the .hx file; in there, you've
recorded this same list of external preprocessor symbols it depends on
and the values for those that were used when this was compiled.  If
they agree with the current values for those symbols, then you just
read in the file.  (I have some ideas about how to dump/undump the
symbol table, but that's another discussion.)

I think we really need something like this to get decent compilation
performance.  The project I work on requires all programs written in
the the language we're writing to include a file that expands to
upwards of 3500 lines of code.  Compiles take on the order of 5
minutes on an IBM RT/PC.  Not very good.  I'd like to see if some
scheme like the above could make this better.

-- 
Dave Detlefs			Any correlation between my employer's opinion
Carnegie-Mellon CS		and my own is statistical rather than causal,
dld@cs.cmu.edu			except in those cases where I have helped to
				form my employer's opinion.  (Null disclaimer.)
--

jim@athsys.uucp (Jim Becker) (11/15/88)

From article <3561@pt.cs.cmu.edu>, by dld@f.gp.cs.cmu.edu (David Detlefs):
>
	[stuff deleted]
> 
> It would be nice if you could precompile each header file to get a
> symbol table (both preprocessor and language tables -- this scheme
> would require an integrated preprocessor/compiler) that you would dump
> to a file. [clipped]
> 

	Something fairly equivalent to this existed in the Amiga
world, and it's net effect was excellent. I found that using this type
of approach made development on a 68000/floppy based Amiga faster than
a 80286/harddisk system! This was part of the Manx development system
(aka - Aztec C).

	The implementation consisted of having a special option for
the compiler that would take the symbol table that it currently had
and outputing the content as a "library" file. Not a standard type
library, but a library of symbols that were known. This creation I did
using a C file that contained #include statements for all the standard
include files that I used in the system. They were then bundled into
this small binary format library.

	During subsequent compilation, the created symbol library file
was included in the command line. Any include files that were needed
from during the compilation process were semantically extracted from
the symbol library when they were needed. Include files that were not
in the symbol library were processed normally.

	The result was a DRAMATIC improvement in the compilation
speed! I have been surprised that this concept hasn't yet come into
other products that I have crossed paths with, as it seems very
elegant and nice (w/ not much effort).

	We just received the Oregon C++ compiler, I hope they may
integrate something such as this into their compiler to help further
improve the speed of compilation!!


> 
> -- 
> Dave Detlefs		
> Carnegie-Mellon CS	


-Jim Becker

ok@quintus.uucp (Richard A. O'Keefe) (11/15/88)

In article <186@tityus.UUCP> jim@athsys.uucp (Jim Becker) describes
>a special option for
>the compiler that would take the symbol table that it currently had
>and outputing the content as a "library" file.

>I have been surprised that this concept hasn't yet come into
>other products that I have crossed paths with, as it seems very
>elegant and nice (w/ not much effort).

That's an _old_ technique!  Burroughs were using it _years_ ago on
their mainframe Algol, Fortran, COBOL, &c compilers.  It has been
described several times in the technical literature.  And of course
Simula 67 was doing fully type-checked separate compilation with
symbol-table files before C (plain C) was dreamed of.

jim@athsys.uucp (Jim Becker) (11/17/88)

From article <681@quintus.UUCP>, by ok@quintus.uucp (Richard A. O'Keefe):
> In article <186@tityus.UUCP> jim@athsys.uucp (Jim Becker) describes
>>a special option for
>>the compiler that would take the symbol table that it currently had
>>and outputing the content as a "library" file.
> 
>>I have been surprised that this concept hasn't yet come into
>>other products that I have crossed paths with, as it seems very
>>elegant and nice (w/ not much effort).
> 
> That's an _old_ technique!  Burroughs were using it _years_ ago on
> their mainframe Algol, Fortran, COBOL, &c compilers.  It has been
> described several times in the technical literature.  And of course
> Simula 67 was doing fully type-checked separate compilation with
> symbol-table files before C (plain C) was dreamed of.


	Ok -- I guess that it's been around for a long time, I haven't
worked with any Burroughs equipment and hadn't seen it w/ the machines
that I have worked on.

	If this is such a time honored technique, the next question is
"Why isn't it used everywhere?". Like I said, my experience has only
seen it implemented on the Amiga, but I have worked will just about
all the DEC line, Data General line (thru 83), Sun and others. Is there
some legal problem with people putting it into the compiler? This is
a fairly simple, and I guess widely used in some circles, technique
that could greatly aid this include dependency problem of C++. Wny 
isn't this already part of the environment?

-Jim Becker

rfg@nsc.nsc.com (Ron Guilmette) (11/17/88)

In article <3561@pt.cs.cmu.edu> dld@f.gp.cs.cmu.edu (David Detlefs) writes:
>There have been a number of interesting suggestions on how to go about
>including header files only as needed...
>...  It seems to me that the "include-once-ability" of a header
>file ought to be a property of the file, independent of the context...
>...Every so often, someone writes an include file that is really meant to be
>included multiple times...

Ever heard of unix links?

>...If we did want to modify cpp, though, we could
>take care of both situations minimally by recording as we process the
>inclusion of a file all the preprocessor symbols used in #ifdef's and
>the values they had at the time of this inclusion.  We would store
>this list in a table, and when we next included the file, check the
>current values of those symbols.  If they are the same as before,
>don't include the file, otherwise do.

I think that there is semantic muddy-ness to the term "at the time of
this inclusion".  When is that exactly?  At the *start* of the inclusion,
somewhere in the middle, or at the end?  Keep in mind that the defined
values may actually change as a result of the inclusion itself.

My idea was far simpler and far less expensive in compile time.  I
suggested (and still suggest) that cpp could be taught to remember a
list of "already included" filenames.  It could then ignore additional
#include's for any file in this list.  You could easily override this
simple logic by making multiple links to individual header files which
must be included multiple times.

>It would be nice if you could precompile each header file to get a
>symbol table (both preprocessor and language tables -- this scheme
>would require an integrated preprocessor/compiler) ...

I think that you may want to switch over to comp.lang.ada ;-)

>I think we really need something like this to get decent compilation
>performance.

Yes.  That is an issue isn't it.  All the more reason to think about
my suggestion.

-- 
Ron Guilmette
National SemiConductor, 1135 Kern Ave. M/S 7C-266; Sunnyvale, CA 94086
Internet: rfg@nsc.nsc.com   or   amdahl!nsc!rfg@ames.arc.nasa.gov
Uucp: ...{pyramid,sun,amdahl,apple}!nsc!rfg

dld@f.gp.cs.cmu.edu (David Detlefs) (11/19/88)

To Ron Guilmette -- I couldn't seem to send you mail --

I think your idea is a very good quick-and-dirty solution -- easy to
understand, easy to implement.  If someone were to take the sources to
some public domain cpp and implement it tomorrow (which is about all it
would take!), it would get a lot of use.  People *shouldn't* write
include files that are intended to be included multiple times with
different results on each inclusion; that is an obscure practice at
best.  Yet people do do this; at least they have, in include files that
we may have to keep including forever in the name of backward
compatibility.  Your solution, with the links, works.  However, I would
hope that you would agree that it is a least somewhat unsatisfying.

I think your solution and mine fall on a continuum; yours is very easy
to implement, but requires the user to perform two actions he or she
wouldn't normally have to do: use the new flag to cpp (or use the flag
to turn it off when necessary, if not including multiply is made the
default), and making extra links to certain files.  Note that you run
the risk of always using the "don't include again mode," and forgetting
to turn it off when you include a file from an external library that
includes a file that includes file x.h 7 times, each time with a
different preprocessor environment.

My solution requires somewhat more work, but in my opinion, not too
much, and solves all problems, without, I think, too much cost in
performance.  I don't think it is "semantically muddy."  To explain it
again: you include a file.  If this file contains no #if-like
constructs, it will generate the same code every time it is included,
so it will never need to be included again.  If it does have an
#if-like construct, then it may generate different code if it is
included again at a time when the preprocessor symbols that are used in
the #if have different values.  (Obviously, if we #define one of these
symbols before using it, it doesn't depend on the external value.)
While we process the #include file for the first time, we record all
the externally defined symbols uses in #if's, and their values when
they were first used.  If we encounter an #include of this file again,
we include it only if one or more of those symbols has a different
value.

This solution requires no special flags, and always gets it right.
Your solution has the drawbacks I mentioned, but has the virtue of
being simpler to implement, and may be somewhat faster.  Both, I think,
are valid ideas.

Trying to keep the intellectual discourse friendly...


-- 
Dave Detlefs			Any correlation between my employer's opinion
Carnegie-Mellon CS		and my own is statistical rather than causal,
dld@cs.cmu.edu			except in those cases where I have helped to
				form my employer's opinion.  (Null disclaimer.)
--

gsf@ulysses.homer.nj.att.com (Glenn Fowler[eww]) (11/19/88)

In article <3614@pt.cs.cmu.edu>, dld@f.gp.cs.cmu.edu (David Detlefs) writes:
> ...
> performance.  I don't think it is "semantically muddy."  To explain it
> again: you include a file.  If this file contains no #if-like
> constructs, it will generate the same code every time it is included,
> ...

this breaks with the following simple example:
x.h:
	extern int	VARIABLE;
tst.c:
	#include "x.h"
	#define VARIABLE	abc
	#include "x.h"

a proper implementation would have to keep a list of all identifiers
referenced within a header to determine if a repeated include would
produce different results -- the problem is compounded by nested
includes
-- 
Glenn Fowler    (201)-582-2195    AT&T Bell Laboratories, Murray Hill, NJ
uucp: {att,decvax,ucbvax}!ulysses!gsf       internet: gsf@ulysses.att.com

campbell@redsox.UUCP (Larry Campbell) (11/21/88)

In article <3614@pt.cs.cmu.edu> dld@f.gp.cs.cmu.edu (David Detlefs) writes:

}                                     ...  People *shouldn't* write
}include files that are intended to be included multiple times with
}different results on each inclusion; that is an obscure practice at
}best.

I must disagree.  We regularly use the following idiom, which I think
is quite defensible, for managing user and error message definitions in
our products:

Messages are defined in a header file that looks like this:

------------------------------message.h------------------------------
#ifdef GenerateTable
#define MSG(name, string) char name[] = string;
#else
#define MSG(name, string) extern char name[];
#endif

MSG(err_you_lose, "you lose big time, pal")
MSG(err_get_real, "get real, pal")
MSG(err_swap_mad, "swap read error, you lose your mind")
---------------------------end of message.h--------------------------

Modules that refer to message definitions just include message.h.
One very small module then defines the messages:

------------------------------message.c------------------------------
#define GenerateTable
#include "message.h"
---------------------------end of message.c--------------------------

We can then produce foreign language versions of our products more easily
because all the messages are confined to one file, and we can also easily
produce an appendix for the documentation that lists all possible error
messages.
-- 
Larry Campbell                          The Boston Software Works, Inc.
campbell@bsw.com                        120 Fulton Street
wjh12!redsox!campbell                   Boston, MA 02146

tuck@alanine.cs.unc.edu (Russ Tuck) (11/21/88)

In article <3614@pt.cs.cmu.edu> dld@f.gp.cs.cmu.edu (David Detlefs) writes:
>...My solution ... To explain it
>again: you include a file.  If this file contains no #if-like
>constructs, it will generate the same code every time it is included,
>so it will never need to be included again.  

This isn't true.  An included file may generate different code if any
preprocessor definitions have changed or been added.  As a trivial example,
consider a triv.h file containing only "int fn();", and the following fragment:

#include "triv.h"
#define int unsigned long
#include "triv.h"

Without a good reason and documentation, this is certainly highly questionable
code, but it shows that *any* identifier or keyword can be redefined.  You 
can't be sure that an include file will generate the same code twice unless 
the preprocessor definitions have not changed at all (and even this is 
probably not sufficient to make a guarantee).

Let me give a motivating example for code like this.  I am implementing a 
hierarchy of mathematical types, derived from a single base, and need to 
provide math and logic expressions for all these classes.  Unfortunately, 
just providing all the operator methods for the base class is not sufficient.
The base class methods return base class values, which can not be assigned 
to derived class objects (because the additional info contained in the 
derived class was lost when the operator method converted the result to 
the base class).  So, I have to redefine all the operator methods for each 
derived class.  To avoid many potential typographic and minor logic errors, 
I do this with a single include file with declarations something like:

type operator+(type)

Then the library user includes a single library definition .h file, which
includes the above file multiple times, something like this:

#define type base
#include "my_math_defs.h"
#define type derived
#include "my_math_defs.h"

(This is greatly simplified from the real code, but gives a feel for one
case where a .h file can usefully generate different code without #if's.)

>...While we process the #include file for the first time, we record all
>the externally defined symbols uses in #if's, and their values when
>they were first used.  If we encounter an #include of this file again,
>we include it only if one or more of those symbols has a different
>value.
>
>This solution requires no special flags, and always gets it right.

No. It's not this simple.  For complete safety, it must be included again
if anything about the set of preprocessor definitions has changed (or perhaps
even if the #include just comes in a different code context).

Russ Tuck		               internet: tuck@cs.unc.edu
Computer Science Dept., Sitterson Hall csnet:    tuck@unc
University of North Carolina           uucp:     {ihnp4|decvax}!mcnc!unc!tuck
Chapel Hill, NC 27599-3175, USA        Phone:    (919) 962-1755 or 962-1932

dld@f.gp.cs.cmu.edu (David Detlefs) (11/22/88)

I got a couple of interesting responses to my last post on this
subject, on which I'd like to comment.  First, Glenn Fowler quite
correctly points out that the simple example

>x.h:
>        extern int      VARIABLE;
>tst.c:
>        #include "x.h"
>        #define VARIABLE        abc
>        #include "x.h"

breaks my scheme (which was, briefly, to record what preprocessor
variables an include file depends on, and their values at the point of
first inclusion, and then only reinclude if those values have
changed.)  Glenn's point is that there is no way to determine what
identifiers are preprocessor variables the first time through; any
random #define may make what was a perfectly good identifier into a macro.
I see no way to save my plan from this fatal flaw, and hereby abandon
it publicly (insert "Taps" here...)  (These kind of sick examples point
out the essential corruption of cpp 1/2 :-)  I abandon it to the
extent that it was supposed to do everything right always while requiring no
new semantics and minimal performance cost.  It still may be a useful
heuristic, as stated, to compete with the "insert an #include_once
declaration in the header file (or in the includer)" or "tell cpp to
just include once."

A final note: Larry Campbell protests my classification of the
practice of including files multiple times in the same program as
"obscure at best," putting forth the following example:

>------------------------------message.h------------------------------
>#ifdef GenerateTable
>#define MSG(name, string) char name[] = string;
>#else
>#define MSG(name, string) extern char name[];
>#endif
>
>MSG(err_you_lose, "you lose big time, pal")
>MSG(err_get_real, "get real, pal")
>MSG(err_swap_mad, "swap read error, you lose your mind")
>---------------------------end of message.h--------------------------

I don't think this is bad usage; I probably didn't make it clear that
the practice I thought was bad was the inclusion of the same file
multiple times in the same *compilation unit* with different intended
results.  I assume that you would only need to include message.h once
per .c file, with GenerateTable either undefined in all but one of the
.c files.  Fine, no problem; what I would have object to would be
doing this using something like


------------------------------message1.h-----------------------------
#ifdef GenerateTable
#define MSG(name) char name[] = MSG_STRING;
#else
#define MSG(name) extern char name[];
#endif
---------------------------end of message1.h--------------------------
------------------------------message.h------------------------------
#define MSG_STRING "you lose big time, pal" 
#include <message1.h>
MSG(err_you_lose)

#define MSG_STRING "get real, pal"
#include <message1.h>
MSG(err_get_real)

#define MSG_STRING "swap read error, you lose your mind"
#include <message1.h>
MSG(err_swap_mad)
---------------------------end of message.h--------------------------

I trust you will agree that this is pretty obscure (at best)!

-- 
Dave Detlefs			Any correlation between my employer's opinion
Carnegie-Mellon CS		and my own is statistical rather than causal,
dld@cs.cmu.edu			except in those cases where I have helped to
				form my employer's opinion.  (Null disclaimer.)
--

prh@actnyc.UUCP (Paul R. Haas) (11/22/88)

In article <10873@ulysses.homer.nj.att.com> gsf@ulysses.homer.nj.att.com (Glenn Fowler[eww]) writes:
>In article <3614@pt.cs.cmu.edu>, dld@f.gp.cs.cmu.edu (David Detlefs) writes:
>> ...
>> [David discusses his scheme for avoiding including files to often.]
>> ...
>
> [Glenn shows a flaw with David's method.]

I think I have a solution which is ANSI compatible and requires minimal
source code changes (some people write their header files this way already).

Make the C preprocessor recognize files of the form:
	comments and whitespace
	#ifndef SOME_SYMBOL
	any properly nested stuff, hopefully including "#define SOME_SYMBOL"
	#end
	comments and whitespace
And only include them a second time if the symbol is not defined.  The
preprocessor would have to keep a table of include file names and
symbols that they depend on.  Obviously, this should be turned off if
the preprocessor is sending comments through.

Advantages:
  o Code will behave the same way with preprocessors without this feature.
  o No #pragmas required
  o Easy to explain to people
Disadvantages:
  o In order for this to speed things up, most header files should be
    written in a particular form.
  o I am not sure how much effort this will take.  It looks like I will
    end up putting some ugly hacks into cccp.c.
  o Will slightly slow down the preprocessor for header files which lack
    the #ifndef ... #endif structure.

Feel free to point out something that I missed.  I am going to look into
adding this to the Gnu preprocessor, cccp.  If anyone else is doing something
similar, let me know.
-----
Paul Haas	uunet!actnyc!prh or prh@frith.egr.msu.edu
(212) 696-3653

rfg@nsc.nsc.com (Ron Guilmette) (11/24/88)

In article <3614@pt.cs.cmu.edu> dld@f.gp.cs.cmu.edu (David Detlefs) writes:
>To Ron Guilmette -- I couldn't seem to send you mail --

Several people have told me this.  I don't know hat's happening but I
intend to find out!

>I think your idea is a very good quick-and-dirty solution -- easy to
>understand, easy to implement.  If someone were to take the sources to
>some public domain cpp and implement it tomorrow (which is about all it
>would take!), it would get a lot of use.

I agree that what I proposed is "quick" I disagree that it is "dirty".
I'll see if I can find the time to make the necessary mods to GNU cpp soon.
If I do get it done, I will post the necessary patches here and in the
GNU C and GNU C++ newsgroups.  The diffs ought to be reasonably small.

>People *shouldn't* write
>include files that are intended to be included multiple times with
>different results on each inclusion; that is an obscure practice at
>best.  Yet people do do this; at least they have, in include files that
>we may have to keep including forever in the name of backward
>compatibility.

I generally agree that this seems to be a questionable practice, but I'd
rather be a bit less judgemental about it.  If people have found what they
think are good reasons for doing this, then, to quote the Beatles, "Let it be".

>Your solution, with the links, works.  However, I would
>hope that you would agree that it is a least somewhat unsatisfying.

No I don't agree.  It tastes great!  No.  It's less filling!  Tastes great!
Less filling! ...

Well, OK.  It is a little kludgy, but then all of cpp is just a big kludge
which people keep on using year after year because it does some things very
well which cannot yet be done in any other ways.

>I think your solution and mine fall on a continuum...

That sounds painful.  I would rather fall on my sword that on a continuum!

>... yours is very easy
>to implement, but requires the user to perform two actions he or she
>wouldn't normally have to do: use the new flag to cpp (or use the flag
>to turn it off when necessary, if not including multiply is made the
>default), and making extra links to certain files.  Note that you run
>the risk of always using the "don't include again mode," and forgetting
>to turn it off...

OK.  The problems you point out are very valid.  Looks like it is time
for me to switch to plan B.

In retrospect, I admit that the notion of an extra flag to cpp (i.e. the
don't do multiple includes flag) was a dumb idea.  As you have noted, it
lacks true backward compatibility in the purest sense (i.e. not even having
to edit your Makefiles).  The multiple links to get multiple inclusions of
the same single file was also a dumb idea.

Plan B:

So let's fix cpp to do the following.  First, under "normal" circumstances,
the new cpp will do just what it has traditionally done, i.e. include EACH
and EVERY file called for in EACH and EVERY #include directive.  Now for
the trick.  Pick one of the UNIX protection mode bits which is typically
NEVER USED and totally INSIGNIFICANT (as least for normal source code files).
Just for the sake of argument, lets pick the "set-gid" bit (i.e. 02000).
Now let's say that we modify cpp so that it will *NOT* re-include any header
file which has its set-gid bit set.  Ta da!  Presto!  No more tricky links,
full backwards compatibility, and best of all, the ability to control
re-inclusions on a file-by-file basis in a very easy and simple manner
which does *not* require any changes to Makefiles or any other files.

The above scheme is simple to implement, simple to use, and simple to
understand.  The semantics are crystal clear and it can be slowly worked
into existing code (both C and C++) on an as-needed basis.

Although it would seem that this scheme is heavily dependent on UNIX-specific
file modes, I believe that most operating systems have at least some ways
of tagging individual files with one more meaningful bit of information.
Specifically, MS-DOS also has a per-file mode, and I believe that VMS does
also.

>My solution requires somewhat more work...
>...I don't think it is "semantically muddy."...

Is my suggested approach "muddy"?  I don't see how?

>again: you include a file.  If this file contains no #if-like
>constructs, it will generate the same ***code*** every time it is included,
>so it will never need to be included again.

It is *NOT* true that, if there are no #if's in a given header file, then
you will get the same "code" (let alone the same "effect") from each inclusion
of that file.  Note the following:

	/* something.h  */

	int my_array[FOOBAR];

There are no #if's in this file, but the code can still be different on
different inclusions if the value of FOOBAR changes.

More significant is that the "effect" of a header file (without any #if's)
can be different for different inclusions, i.e.:

	#define NEW_FOOBAR	FOO##BAR

>[My] solution requires no special [cpp] flags, and always gets it right.
>Your solution has the drawbacks I mentioned, but has the virtue of
>being simpler to implement, and may be somewhat faster.

As described above, I have figured out how to easily avoid extra cpp flags.
As you note, my approach is simple and fast.  I rest my case.

ok@quintus.uucp (Richard A. O'Keefe) (11/24/88)

Instead of tinkering with the definition of an existing construct,
why not do the sanitary thing and add a new one?
	#use {filename}
resolve a "filename" or <filename> as #include would, and if that
resolved file name has already been #included or #used, do nothing
otherwise #include it.  Easy to do, can't break old code, doesn't
require changes to the file being used.  In fact, if you record
the usage of the filename when the #include or #use is _started_,
you can even have files which #use each other (this can be useful).

greg@cantuar.UUCP (G. Ewing) (11/24/88)

Instead of including the same header file several times, what's
wrong with having the header file define a macro, including it
once, and then using the macro several times?

Using Larry Campbell (campbell@redsox.UUCP)'s example (sort of):

------------------------------message.h------------------------------
#define GENERATE_MESSAGES \
MSG(err_you_lose, "you lose big time, pal") \
MSG(err_get_real, "get real, pal") \
MSG(err_swap_mad, "swap read error, you lose your mind") \
#undef MSG
---------------------------end of message.h--------------------------

------------------------------message.c------------------------------
#include "message.h"

#define MSG(name, string) char name[] = string;
GENERATE_MESSAGES

#define MSG(name, string) extern char name[];
GENERATE_MESSAGES
---------------------------end of message.c--------------------------

Hmm... that perhaps wasn't the most useful of examples, but you get
what I mean, I hope. The point is that cpp could safely be hacked
to only ever include things once; anything that can be done with
multiply included files can also be done using macros.

Can anyone think of something that couldn't?

(P.S. I'd allow a backward-compatibility flag, before anyone jumps
on me...)

Greg Ewing				Internet: greg@cantuar.uucp
Spearnet: greg@nz.ac.cantuar		Telecom: +64 3 667 001 x8357
UUCP:	  ...!{watmath,munnari,mcvax,vuwcomp}!cantuar!greg
Post:	  Computer Science Dept, Univ. of Canterbury, Christchurch, New Zealand
Disclaimer: The presence of this disclaimer in no way implies any disclaimer.

henry@utzoo.uucp (Henry Spencer) (11/26/88)

In article <738@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>Instead of tinkering with the definition of an existing construct,
>why not do the sanitary thing and add a new one?
>	#use {filename}
>... Easy to do, can't break old code, doesn't
>require changes to the file being used.

Well, leaving the existing construct alone (except perhaps in the presence
of a compiler flag) is the right thing to do, but "doesn't require changes
to the file being used" is a bug, not a feature.  In most cases, the fact
that #including a file n times is the same as including it once is a
property of the file being included, not the file doing the including.
The probability of trouble and mistakes will be much lower if it's the
file being included that determines whether future inclusions have any
effect.  The Waterloo "#pragma idempotent" strikes me as the right method;
among other things, it means that your code is portable.
-- 
Sendmail is a bug,             |     Henry Spencer at U of Toronto Zoology
not a feature.                 | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

ok@quintus.uucp (Richard A. O'Keefe) (11/26/88)

In article <867@cantuar.UUCP> greg@cantuar.UUCP (G. Ewing) writes:
>Instead of including the same header file several times, what's
>wrong with having the header file define a macro, including it
>once, and then using the macro several times?

It requires that the macro processor be able to handle very large
macros, which might not otherwise be required.

Also, it is not possible for the two approaches to be equivalent
(consider the built-in macros __FILE__ and __LINE__).

>The point is that cpp could safely be hacked
>to only ever include things once; anything that can be done with
>multiply included files can also be done using macros.

But not using the existing sources.

If you don't like #include, use something else.  It is bad manners to
break other people's tools, without even asking them.

As an example of a company with good manners, the High C compiler has
a pragma which goes something like this:
	pragma   Include(<file>);	/* or   Include("file"); */
	pragma C_Include(<file>);	/* or C_Include("file"); */
where C_Include includes the file only if it hasn't already been included.
(This is "pragma", by the way, not "#pragma".)

Note that at least in UNIX systems, it is possible to
	#include "/dev/tty"
{This can actually be useful, if you know what you're doing.}
There are other kinds of files where the contents may be different,
_by design_, when you open them again.

ok@quintus.uucp (Richard A. O'Keefe) (11/27/88)

In article <1988Nov25.180309.9323@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>In article <738@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>>	#use {filename}
>>... Easy to do, can't break old code, doesn't
>>require changes to the file being used.
>
>Well, leaving the existing construct alone (except perhaps in the presence
>of a compiler flag) is the right thing to do, but "doesn't require changes
>to the file being used" is a bug, not a feature.  In most cases, the fact
>that #including a file n times is the same as including it once is a
>property of the file being included, not the file doing the including.

Which is why "doesn't require changes to the file being used" is NOT
a design error.  If it is the case that including the file N times is
ok, then the file doesn't NEED _any_ changes.  If it is not the case
that including the file N times is ok, then a construct like this should
not be used.  What I had in mind is the situation where someone *else*
owns the header file in question, so that you _can't_ change it.  Ok,
you can get around that by interposing your own file which says
#pragma include_once; #include "the-real-file"
but you can just as easily do that by putting
	#ifndef	FOOBAZ_H
	#include "real_foobaz.h"
	#define	FOOBAZ_H 1
	#endif
in your "foobaz.h" file.  Actually, now that I come to think of it,
just how hard _is_ it to write a 4-line wrapper in order to make anmult
idempotent version of a header?

I'm afraid that it is not really the case that idempotence is "a property
of the file being included, not the file doing the including.", only if
the file being included contains no identifiers which could be #defined
by the caller.  Suppose we have
	extern int strlen(char*);
in a file strlen.h.  Now consider
	#define int double
	#define strlen atof
	#include "strlen.h"
	#undef int
	#undef strlen
	#include "strlen.h"
Not a terribly reasonable thing to do, but sufficient to show that the
idempotence of even such an innocent header as that IS dependent on the
includer, so that it *is* advisable for the includer to explicitly
promise not to break it.  "In most cases", true, this doesn't happen,
but that is a property of the includer!

rfg@nsc.nsc.com (Ron Guilmette) (11/27/88)

In article <1988Nov25.180309.9323@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>In most cases, the fact
>that #including a file n times is the same as including it once is a
>property of the file being included, not the file doing the including.

Absolutely right Henry.  I realized this very fact just before I suggested
using the sid-uid bit on header files which are not to be included more
than once.

In hindsight though, I now realize that even that was a silly suggestion.
I now think that putting something like:

	#once

or

	#pragma once

into such header files would be more appropriate, more clear, and a more
portable solution.

>The probability of trouble and mistakes will be much lower if it's the
>file being included that determines whether future inclusions have any
>effect.

Exactly right.

>The Waterloo "#pragma idempotent" strikes me as the right method;
>among other things, it means that your code is portable.

Is this (semantically) the same thing as I am suggesting?  I have never
heard of this before!  If it has the effect of forcing the file it is
found within to only be included once then I guess that there is at least
one clear (although not yet standard) precedent for an approach to this
problem.

campbell@redsox.UUCP (Larry Campbell) (11/28/88)

In article <8080@nsc.nsc.com> rfg@nsc.nsc.com.UUCP (Ron Guilmette) writes:
}In article <1988Nov25.180309.9323@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
}In hindsight though, I now realize that even that was a silly suggestion.
}I now think that putting something like:
}
}	#once
}or
}	#pragma once
}
}into such header files would be more appropriate, more clear, and a more
}portable solution.
 ...
}>The Waterloo "#pragma idempotent" strikes me as the right method;
}>among other things, it means that your code is portable.

How portable can something be that's not even implemented yet???  If you
just wrap your header files like this:

	#ifdef foo_included
	 ...
	#define foo_included
	#endif

it achieves the result you desire, completely portably, today, in both
C++ and C, without requiring changes to preprocessors or funky #pragma
statements (Dr. Coggins' esthetic distaste for this much pragmatism
notwithstanding).

Really, I think this is a SOLVED PROBLEM that's been getting beaten to
death.  Could we move on to more interesting aspects of the import/export
question than how to get it done with the current syntax-oriented paradigm?
-- 
Larry Campbell                          The Boston Software Works, Inc.
campbell@bsw.com                        120 Fulton Street
wjh12!redsox!campbell                   Boston, MA 02146

akwright@watdragon.waterloo.edu (Andrew K. Wright) (11/29/88)

In article <8080@nsc.nsc.com> rfg@nsc.nsc.com.UUCP (Ron Guilmette) writes:
>In article <1988Nov25.180309.9323@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>>In most cases, the fact
>>that #including a file n times is the same as including it once is a
>>property of the file being included, not the file doing the including.
>>
>>The probability of trouble and mistakes will be much lower if it's the
>>file being included that determines whether future inclusions have any
>>effect.
>
>Exactly right.
>
>>The Waterloo "#pragma idempotent" strikes me as the right method;
>>among other things, it means that your code is portable.
>
>Is this (semantically) the same thing as I am suggesting?  I have never
>heard of this before!  If it has the effect of forcing the file it is
>found within to only be included once then I guess that there is at least
>one clear (although not yet standard) precedent for an approach to this
>problem.

I will expand a bit on my original posting about
	#pragma idempotent
(or #pragma once  as Ron Guilmette suggests).

When a file marked with #pragma idempotent is included the first time,
the preprocessor makes a note of this.  If the file is later included
again, the include directive is simply ignored.  Determining when the
file is included "again" means determining when two include paths are
equal.  You might want to go by physical file equality (ie. compare
devices and i-nodes), pathname equality, or string equality.  The
Waterloo compiler uses simple string equality:  two include paths are
considered equal if strcmp() on the argument strings of the include
directives, modulo their quotes or <>s, reports they are equal.  We
think it is a bad idea to build operating system dependant features
such as links or i-nodes into the semantics of the language; C runs
on many non-UNIX systems without such features.  Besides, you will
never get the ANSI committee to agree to such.

Another poster noted that idempotency is not entirely an attribute of
the included file.  This is true; both the includer and the includee
must agree that the file is to have idempotent semantics, or someone
will get surprised.  Thus #pragma idempotent is more than just a statement
that redundant #includes of this file will be ignored; it asserts
that the user is NOT ALLOWED to redefine any names appearing in
such a file.  This matches the ANSI committee's intent for the standard
include files:  the user is not permitted to redefine anything in
<stdio.h> for instance.  (He is permitted to #undef macros to get
real functions).  <stdio.h> can therefore be marked with
#pragma idempotent, as can all the other standard include files.

In fact, It just occurs to me that CPP could mark identifiers brought
in by idempotent include files as non-redefinable.  This has the
obvious advantage that the ANSI rule
"<the several hundred> standard identifiers cannot be redefined"
becomes a compiler enforced rule, not just a vapor rule.
(Personally I hate the idea of several hundred reserved words, but
 the committee seems to be well on its way down that path.)

Lastly, #pragma idempotent  is upwards compatible.  It does not
break *any* (not just few) existing C programs.  It does have
the disadvantage that pragmas are not required (by ANSI) to be
supported by all compilers, so you may still have to wrap #ifndefs
around your include file bodies if you expect to have your code
ported to any C compiler.

Andrew K. Wright      akwright@watmath.waterloo.edu
CS Dept., University of Waterloo, Ont., Canada.

henry@utzoo.uucp (Henry Spencer) (11/30/88)

In article <562@redsox.UUCP> campbell@redsox.UUCP (Larry Campbell) writes:
>How portable can something be that's not even implemented yet???  If you
>just wrap your header files [in #ifndef]...
>it achieves the result you desire, completely portably, today, in both
>C++ and C, without requiring changes to preprocessors or funky #pragma
>statements... Really, I think this is a SOLVED PROBLEM ...

Not so.  The problem is that a header wrapped in #ifndef still needs to
be opened, read, and scanned completely every time, which is costly when
the #include relationships are complex and common header files get picked
up many, many times.  One can imagine a tricky compiler which notices
the wrapping and optimizes this case, but that's not going to be easy.

In practice, a header file using "#pragma idempotent" or whatever would
still include the #ifndef wrapper, for the sake of portability.  (There
is no requirement that #pragma be portable, but there *is* a requirement
that ANSI C implementations ignore unrecognized #pragmas.)
-- 
SunOSish, adj:  requiring      |     Henry Spencer at U of Toronto Zoology
32-bit bug numbers.            | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

darin@Apple.COM (Darin Adler) (12/07/88)

In article <1988Nov29.203751.26424@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
> In article <562@redsox.UUCP> campbell@redsox.UUCP (Larry Campbell) writes:
> >How portable can something be that's not even implemented yet???  If you
> >just wrap your header files [in #ifndef]...
> >it achieves the result you desire, completely portably, today, in both
> >C++ and C, without requiring changes to preprocessors or funky #pragma
> >statements... Really, I think this is a SOLVED PROBLEM ...
> 
> Not so.  The problem is that a header wrapped in #ifndef still needs to
> be opened, read, and scanned completely every time, which is costly when
> the #include relationships are complex and common header files get picked
> up many, many times.  One can imagine a tricky compiler which notices
> the wrapping and optimizes this case, but that's not going to be easy.

I have to disagree here, Henry. Writing a compiler (or preprocessor) that
notices the wrapping and optimizes it is extremely easy. While the preprocessor
is scanning a file it simply sets a "start of file" flag which is cleared
when anything besides a comment is seen. If an #ifndef occurs with this flag
set, the preprocessor symbol can be recorded a table, along with the file name.
(Of course, another check to be sure that the #endif is the last non-comment
line in the file as well would be necessary, but that is left as a simple
exercise for the implementor.) Then, any time the same file is included,
that symbol can be checked...if it is defined, the file need not be opened.
Note that even if the system can't recognize "this.h" and <this.h> and
"/this.h" as the same file, there is no problem, since the file protects itself
from multiple inclusion.

> In practice, a header file using "#pragma idempotent" or whatever would
> still include the #ifndef wrapper, for the sake of portability.  (There
> is no requirement that #pragma be portable, but there *is* a requirement
> that ANSI C implementations ignore unrecognized #pragmas.)

Note that the above scheme is superior because it gets the same results as
the pragma without typing it, since you still plan to include the #ifndef
wrapper. I'm ready to recommend to our compiler folks that they implement
the scheme now!
--
Darin Adler					              AppleLink: Adler4
UUCP: {amdcad,decwrl,hoptoad,nsc,sun}!apple!darin     Internet: darin@Apple.com

henry@utzoo.uucp (Henry Spencer) (12/09/88)

In article <21780@apple.Apple.COM> darin@Apple.COM (Darin Adler) writes:
>... Writing a compiler (or preprocessor) that
>notices the wrapping and optimizes it is extremely easy...

How extremely easy or significantly difficult it is depends on how your
compiler is implemented.  However, I concede that I was overly pessimistic
about this, and it's probably the right approach.
-- 
SunOSish, adj:  requiring      |     Henry Spencer at U of Toronto Zoology
32-bit bug numbers.            | uunet!attcan!utzoo!henry henry@zoo.toronto.edu