[gnu.g++] A solution to the multiple inclusion problem

nagle@well.UUCP (John Nagle) (10/23/89)

      The problem is well-known.  It would be desirable if all include files
themselves included everything they needed, so that order of inclusion 
was taken care of automatically.  But, of course, this results in multiple
inclusion of the same files, which causes problems.  A common work-around
for this problem is to use a construct like the following in each include file:

	#ifndef XXX
	#define XXX
	...content...
	#endif

This works, but on the second inclusion, the file still has to be read and
parsed, at least by the level of processing that reads "#" statements.
(Many newer compilers do "#" processing in the same pass as main compilation,
so referring to a "preprocessor" in this context is not necessarily correct.)
With widespread use of this technique within library files, some files may
be read a large number of times, mostly to be ignored.  This slows compilation.
The problem is especially severe in large C++ programs, where large numbers of
header files are necessary, and nested header files are not at all uncommon.

      It's been proposed that the semantics of "#include" be changed to 
avoid all multiple inclusion.  But this is controversial, and would require
ANSI approval.

      I propose a solution via compiler optimization.
The compiler should behave as follows:

	1.  If, when reading an "included" file, there are
	    no non-comment statements before the first "#ifndef"
	    (if any), and no non-comment statements after the "#endif"
	    matching said "#ifndef", the compiler shall associate
	    the tag found on the "#ifndef" line with the name of the
	    "#include" file.

	2.  When processing an "#include" statement, if the file has
	    an associated tag as defined in 1) above, and the tag is
	    defined (in the sense of "#define") the file shall not be
	    included.

This is a completely compatible solution to the problem.  Old compilers
will compile include files written with the "ifndef" convention correctly,
but slowly, and new compilers will do it faster.  No standardization
action is required.  Any implementor can install this now, and speed up
their product.

      One could argue for a more elegant but less compatible solution, but
the political hassles aren't worth it.

					John Nagle

tada@athena.mit.edu (Michael J Zehr) (10/24/89)

In article <14240@well.UUCP> nagle@well.UUCP (John Nagle) writes:
>[problem of multiple exclusion, generally solved by:]
>	#ifndef XXX
>	#define XXX
>	...content...
>	#endif
>This works, but on the second inclusion, the file still has to be read and
>parsed, at least by the level of processing that reads "#" statements.
>[slowing compilation, particularly in C++ programs]

>[proposed compiler optimization binding the 'XXX' defined above to the
>included file and not including it a second time.]

there's another solution you can manage without having to get your
compiler vendor to make such an extension:

<foo.h:>
#ifndef FOO
#define FOO
#include "foo_real.h"
#endif

admittedly there's an extra file open for the first time reading
through, and you still have to open files when #including the second
time, but it's not as bad as having to read the entire file in
(particularly if you have some really *large* files).  library files
could be changed to this with relatively little effort and no compiler
changes, and users' header files for large systems could be changed to
work faster *today* without having to wait for the vendors to decide to
do this.

(unless i'm missing something obvious and being really stupid this
works....)

-michael j zehr

rsalz@bbn.com (Rich Salz) (10/24/89)

Gee, if
	#ifndef _HAVE_FOO_H_
	... contents of foo.h ...
	#define _HAVE_FOO_H_
	#endif	/* _HAVE_FOO_H_ */

is too slow, then do this:
	#ifndef _HAVE_FOO_H_
	#include <hidden_real_name_of_foo.h>
	#define _HAVE_FOO_H_
	#endif	/* _HAVE_FOO_H_ */

John's rules about the first #ifndef after the first comment sound much
too complicated for me -- I have enough problem with election day...
	/r$
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.
Use a domain-based address or give alternate paths, or you may lose out.

crowl@cs.rochester.edu (Lawrence Crowl) (10/24/89)

In article <14240@well.UUCP> nagle@well.UUCP (John Nagle) notes three
solutions to the multiple file inclusion problem:

(1) An include file protects itself via #ifndef on a symbol that it defines.
    This causes the file to be read multiple times.

(2) Modifying the semantics of #include to only include a file once.  This is
    incompatible with the current definition of #include.

(3) His proposal for modifying the implementation of #include to recognize the
    idiom in (1).  Implementations need not read a file twice.

However, a solution to the multiple inclusion problem already exists.  It is
not necessary (nor desireable in my opinion) to modify #include semantics or
implementations.  The solution is as follows.

Each include file defines a symbol (preferably related to its name).  For
example, in foo.h:

        #define foo_h
        ...

Each file that includes foo.h, protects the inclusion with a #ifndef:

        ...
        #ifndef foo_h
        #include "foo.h"
        #endif foo_h
        ...

Programmers of include files can further protect against multiple inclusion
by using the standard mechanism:

        #ifndef foo_h
        #define foo_h 
        ...
        #endif foo_h

This solution has the following properties:
- The compiler reads include files exactly once.
- No modifications to current systems are required.
- Includes are three lines long instead of one.
- A small (really) amount of additional programmer effort is required.

I have used this solution as a matter of course since shortly after I learned
to program in C.  It is an obvious solution, and leads me to wonder why it is
not common practice.  Any explainations?
-- 
  Lawrence Crowl		716-275-9499	University of Rochester
		      crowl@cs.rochester.edu	Computer Science Department
...!{allegra,decvax,rutgers}!rochester!crowl	Rochester, New York,  14627

hascall@atanasoff.cs.iastate.edu (John Hascall) (10/24/89)

In some article with a silly huge id#, Lawrence Crowl writes:
}In article <14240@well.UUCP> nagle@well.UUCP (John Nagle) notes three
}solutions to the multiple file inclusion problem:
 
}(2) Modifying the semantics of #include to only include a file once.  This is
}    incompatible with the current definition of #include.
 
}  [yet another scheme...]

    Since the impending ANSI standard requires that including a file more
    than once have exactly the same effect as including it once...why can't
    a compiler just ignore #includes for files it has already #included???
    (at least for the "standard" includes)
    
    Any comments from the ANSI mavens?


    I know some people use stuff like:

 main.c:                              subr.c:
    #define MAINDEF 1                    #define MAINDEF 0
    #include "vars.h"                    #include "vars.h"
    main()                               subrtn()
    { ...                                { ...

    but, does anyone do this thing in the *same* file??

John Hascall

tneff@bfmny0.UU.NET (Tom Neff) (10/24/89)

In article <1659@atanasoff.cs.iastate.edu> hascall@atanasoff.UUCP (John Hascall) writes:
>    Since the impending ANSI standard requires that including a file more
>    than once have exactly the same effect as including it once...why can't
>    a compiler just ignore #includes for files it has already #included???
>    (at least for the "standard" includes)

Including standard HEADER files should be idempotent.  Back here in the
real world there are plenty of uses for including a file multiple times
with a desired substantial effect on each inclusion.  Examples include
program generated data tables, copyright strings, and machine dependent
code sequences.  Any compiler that unconditionally ignored an include file
on the second mention would be horribly broken.
-- 
"My God, Thiokol, when do you      \\    Tom Neff
want me to launch?  Next April?"   \\    tneff@bfmny0.UU.NET

ken@cs.rochester.edu (Ken Yap) (10/24/89)

|(1) An include file protects itself via #ifndef on a symbol that it defines.
|    This causes the file to be read multiple times.

But is this really as inefficient as people think? I tried the
following on a Sun-4/60

% wc grammar0.cc
     932    2944   19700 grammar0.cc
% g++ -I../h -E grammar0.cc | wc
    3728    8219   63497
% time g++ -I../h -E grammar0.cc > /tmp/foo.cc
0.4u 0.3s 0:01 44% 0+208k 0+9io 0pf+0w

Looks pretty insignificant compared to parsing and CG time.

Just to prove that multiple inclusions were attempted

% grep '#' /tmp/foo.cc | sort +2 -3 | uniq -2 -c
   2 # 1 "../h/pg_types.h" 1
   1 # 42 "../h/pg_types.h"
   1 # 1 "/usr/su/lib/g++-include/BitSet.h" 1
   2 # 28 "/usr/su/lib/g++-include/BitSet.h" 2
   1 # 1 "/usr/su/lib/g++-include/File.h" 1
   3 # 27 "/usr/su/lib/g++-include/File.h" 2
   1 # 1 "/usr/su/lib/g++-include/builtin.h" 1
   3 # 48 "/usr/su/lib/g++-include/builtin.h" 2
   1 # 1 "/usr/su/lib/g++-include/math.h" 1
   1 # 126 "/usr/su/lib/g++-include/math.h" 2
   2 # 1 "/usr/su/lib/g++-include/std.h" 1
   1 # 225 "/usr/su/lib/g++-include/std.h"
   2 # 1 "/usr/su/lib/g++-include/stddef.h" 1
   1 # 59 "/usr/su/lib/g++-include/stddef.h"
   1 # 1 "/usr/su/lib/g++-include/stdio.h" 1
   2 # 1 "/usr/su/lib/g++-include/stream.h" 1
   1 # 160 "/usr/su/lib/g++-include/stream.h"
   1 # 27 "/usr/su/lib/g++-include/stream.h" 2
   2 # 1 "/usr/su/lib/g++-include/values.h" 1
   2 # 92 "/usr/su/lib/g++-include/values.h"
   1 # 1 "error.h" 1
   2 # 1 "grammar.h" 1
   2 # 11 "grammar.h" 2
   1 # 127 "grammar.h"
   4 # 13 "grammar.h" 2
   1 # 1 "grammar0.cc"
  10 # 1 "grammar0.cc" 2
   2 # 1 "item.h" 1
   1 # 98 "item.h"
   2 # 1 "option.h" 1
   1 # 40 "option.h"
   1 # 1 "pg.h" 1
   2 # 10 "pg.h" 2
   1 # 1 "pggram.h" 1
   1 # 1 "predict.h" 1
   1 # 9 "predict.h" 2
   1 # 1 "production.h" 1
   3 # 10 "production.h" 2
   5 # 1 "symbol.h" 1
   1 # 10 "symbol.h" 2
   4 # 140 "symbol.h"
   1 # 9 "symbol.h" 2
   4 # 1 "symset.h" 1
   2 # 10 "symset.h" 2
   3 # 84 "symset.h"
   1 # 9 "symset.h" 2
   1 # 1 "symtab.h" 1
   1 # 9 "symtab.h" 2
   3 # 1 "termset.h" 1
   2 # 50 "termset.h"
   1 # 9 "termset.h" 2

Some of the .h files are pretty hefty, as you can see from the size of
the expanded source.

I don't think I will lose any sleep over what cpp is doing.

meissner@dg-rtp.dg.com (Michael Meissner) (10/24/89)

In article <1659@atanasoff.cs.iastate.edu>
hascall@atanasoff.cs.iastate.edu (John Hascall) writes:

>      Since the impending ANSI standard requires that including a file more
>      than once have exactly the same effect as including it once...why can't
>      a compiler just ignore #includes for files it has already #included???
>      (at least for the "standard" includes)
>      
>      Any comments from the ANSI mavens?

Only the include files specified by standard (stdio.h, string.h, etc.)
are required to work when included multiple times (ie, there must be
some sort of guard around parts that can not be redeclared, like
typedefs and structures).  No such requirement is mandated for any
other include file.
--

Michael Meissner, Data General.				If compiles where much
Uucp:		...!mcnc!rti!xyzzy!meissner		faster, when would we
Internet:	meissner@dg-rtp.DG.COM			have time for netnews?

sartin@hplabsz.HPL.HP.COM (Rob Sartin) (10/25/89)

In article <1659@atanasoff.cs.iastate.edu> hascall@atanasoff.UUCP (John Hascall) writes:
>    Since the impending ANSI standard requires that including a file more
>    than once have exactly the same effect as including it once...why can't
>    a compiler just ignore #includes for files it has already #included???
>    (at least for the "standard" includes)

That introduces some potential oddities for <assert.h> which doesn't try
to prevent from being included twice.  My C++ 2.0 assert.h includes a
comment that says:

/* This header file intentionally has no wrapper, since the user
*  may want to re-include it to turn off/on assertions for only
*  a portion of the source file.
*/

>    but, does anyone do this thing in the *same* file??

=begin foo.c
#define NDEBUG
#include <assert.h>
int tested_function()
{
...
}

#undef NDEBUG
#include <assert.h>
int untested_function()
{
}
=end foo.c

Rob Sartin			internet: sartin@hplabs.hp.com
Software Technology Lab 	uucp    : hplabs!sartin
Hewlett-Packard			voice	: (415) 857-7592

gwyn@smoke.BRL.MIL (Doug Gwyn) (10/25/89)

In article <14240@well.UUCP> nagle@well.UUCP (John Nagle) writes:
>      It's been proposed that the semantics of "#include" be changed to 
>avoid all multiple inclusion.  But this is controversial, and would require
>ANSI approval.

It's not especially controversial, because as you imply it would be a
change to a well-defined characteristic of the C language.  Thus when
it was proposed to X3J11, we had little difficulty in determining that
the proposal must be rejected.  Several people attested to the fact
that they have existing code that requires the existing semantics.

>      I propose a solution via compiler optimization.

Your solution does not at all seem to me to preserve existing semantics.

shap@delrey.sgi.com (Jonathan Shapiro) (10/25/89)

Why does everybody feel compelled to reinvent this wheel?

The current most widley accepted solution for single inclusion is to
insert a pragma into the header file:

   #pragma once

Jonathan Shapiro
Synergistic Computing Associates

shap@delrey.sgi.com (Jonathan Shapiro) (10/25/89)

Okay, here's another cute trick.

Have an include file called include-files.h, which contains things
like

   #define FRED_H "fred.h"
   #define WILMA_H "wilma.h"

include them by doing the following in all source files:

   #include "include-files.h"  // done once in all source files
  
   #include FRED_H
   #include ...

inside FRED_H do the following:

   #ifdef FRED_H
   #undef FRED_H
   #define FRED_H /dev/null
   #endif

This works, doesn't require much overhead, and can be automatically
done applied to existing code by a fairly simple shell script.

Jonathan S. Shapiro
Synergistic Computing Associates

bph@buengc.BU.EDU (Blair P. Houghton) (10/25/89)

How about ( in <foo.h>):

	#pragma Never_Again

Which tells the BlairTech/ANSI1.0 compiler that when it hits
a #include to read this file again, it should just ignore it.

				--Blair
				  "Which is what the original
				   poster was saying, only in an
				   'I want it standardized, maybe'
				   manner..."

henry@utzoo.uucp (Henry Spencer) (10/25/89)

In article <11396@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>>      I propose a solution via compiler optimization.
>
>Your solution does not at all seem to me to preserve existing semantics.

Can you elaborate on this, Doug?  Seems to me like what he was proposing --
have compiler recognize files bracketed with `#ifndef FOO_H' and remember
the bracketing -- comes under the "as if" rule.  Re-including such a file
with FOO_H defined cannot possibly have any effect except to slow down
the compilation.
-- 
A bit of tolerance is worth a  |     Henry Spencer at U of Toronto Zoology
megabyte of flaming.           | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

nagle@well.UUCP (John Nagle) (10/27/89)

In article <1989Oct24.060920.28655@cs.rochester.edu> ken@cs.rochester.edu writes:
->But is this really as inefficient as people think? I tried the
->following on a Sun-4/60
->
->% wc grammar0.cc
->     932    2944   19700 grammar0.cc
->% g++ -I../h -E grammar0.cc | wc
->    3728    8219   63497
->% time g++ -I../h -E grammar0.cc > /tmp/foo.cc
->0.4u 0.3s 0:01 44% 0+208k 0+9io 0pf+0w
->
->Looks pretty insignificant compared to parsing and CG time.

       What are you comparing to what?  Only one time measurement is given.
This makes it rather meaningless to draw any conclusions.


					John Nagle

diamond@csl.sony.co.jp (Norman Diamond) (10/27/89)

Someone:

>>>      I propose a solution via compiler optimization.

In article <11396@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:

>>Your solution does not at all seem to me to preserve existing semantics.

In article <1989Oct25.164145.29980@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:

>Can you elaborate on this, Doug?  Seems to me like what he was proposing --
>have compiler recognize files bracketed with `#ifndef FOO_H' and remember
>the bracketing -- comes under the "as if" rule.  Re-including such a file
>with FOO_H defined cannot possibly have any effect except to slow down
>the compilation.

I mostly agree with Mr. Spencer.  However, perhaps we need an additional
interpretation ahead of this one.  Is inclusion a "volatile" operation?

Hmm.  In fact, if a program is being interpreted instead of compiled,
the program itself can be responsible for its include files being
changed between two successive inclusions.  Now maybe C will finally
replace LISP.  :-) :-)

-- 
Norman Diamond, Sony Corp. (diamond%ws.sony.junet@uunet.uu.net seems to work)
  Should the preceding opinions be caught or     |  James Bond asked his
  killed, the sender will disavow all knowledge  |  ATT rep for a source
  of their activities or whereabouts.            |  licence to "kill".

ken@cs.rochester.edu (Ken Yap) (10/27/89)

|->But is this really as inefficient as people think? I tried the
|->following on a Sun-4/60
|->
|->% wc grammar0.cc
|->     932    2944   19700 grammar0.cc
|->% g++ -I../h -E grammar0.cc | wc
|->    3728    8219   63497
|->% time g++ -I../h -E grammar0.cc > /tmp/foo.cc
|->0.4u 0.3s 0:01 44% 0+208k 0+9io 0pf+0w
|->
|->Looks pretty insignificant compared to parsing and CG time.
|
|       What are you comparing to what?  Only one time measurement is given.
|This makes it rather meaningless to draw any conclusions.

Sorry, sloppy of me:

% time g++ -I../h -c grammar0.cc
10.5u 2.5s 0:22 57% 0+1532k 34+37io 202pf+0w

5% of the total time is not something I care to worry about at this time.

gwyn@smoke.BRL.MIL (Doug Gwyn) (10/28/89)

In article <1989Oct25.164145.29980@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
-Can you elaborate on this, Doug?  Seems to me like what he was proposing --
-have compiler recognize files bracketed with `#ifndef FOO_H' and remember
-the bracketing -- comes under the "as if" rule.  Re-including such a file
-with FOO_H defined cannot possibly have any effect except to slow down
-the compilation.

I received some private correspondence on this also, and apparently I
didn't grasp the actual meat of the proposal.

I suppose if further #undef/#define of the identifier were properly
tracked, it would work.

dhesi@sun505.UUCP (Rahul Dhesi) (10/28/89)

In article <1088@odin.SGI.COM> shap@delrey.sgi.com (Jonathan Shapiro)
writes:

>   #include FRED_H

Please be alert for problems.  K&R requires the token after the
"#include" to be a filename enclosed in double quotes or angle
brackets, not an arbitrary symbol.  It was not until the ANSI C
standard that the generalized syntax was blessed.

Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
UUCP:  oliveb!cirrusl!dhesi
Use above addresses--email sent here via Sun.com will probably bounce.

marc@dumbcat.UUCP (Marco S Hyman) (10/29/89)

In article <1011@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes:
    In article <1088@odin.SGI.COM> shap@delrey.sgi.com (Jonathan Shapiro)
    writes:
    
    >   #include FRED_H
    
    Please be alert for problems.  K&R requires the token after the
    "#include" to be a filename enclosed in double quotes or angle
    brackets, not an arbitrary symbol.

And the C compiler shipped with System V/386 3.2 (ISC's flavor) coffs
(I couldn't help myself ;-) on that one -- At least it didn't like
#include __FILE__.  (Another reason to use gcc/g++).

--marc
-- 
// Marco S. Hyman		{ames,pyramid,sun}!pacbell!dumbcat!marc

dhesi@sunscreen.UUCP (Rahul Dhesi) (10/30/89)

In article <1087@odin.SGI.COM> shap@delrey.sgi.com (Jonathan Shapiro) writes:
>The current most widley accepted solution for single inclusion is to
>insert a pragma into the header file:
>
>   #pragma once

A serious mistake, because this pragma can affect the meaning of a
program, and therefore cannot be safely ignored.

Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
UUCP:  oliveb!cirrusl!dhesi
Use above addresses--email sent here via Sun.com will probably bounce.