[comp.lang.c++] A solution to the multiple inclusion problem

nagle@well.UUCP (John Nagle) (10/23/89)

      The problem is well-known.  It would be desirable if all include files
themselves included everything they needed, so that order of inclusion 
was taken care of automatically.  But, of course, this results in multiple
inclusion of the same files, which causes problems.  A common work-around
for this problem is to use a construct like the following in each include file:

	#ifndef XXX
	#define XXX
	...content...
	#endif

This works, but on the second inclusion, the file still has to be read and
parsed, at least by the level of processing that reads "#" statements.
(Many newer compilers do "#" processing in the same pass as main compilation,
so referring to a "preprocessor" in this context is not necessarily correct.)
With widespread use of this technique within library files, some files may
be read a large number of times, mostly to be ignored.  This slows compilation.
The problem is especially severe in large C++ programs, where large numbers of
header files are necessary, and nested header files are not at all uncommon.

      It's been proposed that the semantics of "#include" be changed to 
avoid all multiple inclusion.  But this is controversial, and would require
ANSI approval.

      I propose a solution via compiler optimization.
The compiler should behave as follows:

	1.  If, when reading an "included" file, there are
	    no non-comment statements before the first "#ifndef"
	    (if any), and no non-comment statements after the "#endif"
	    matching said "#ifndef", the compiler shall associate
	    the tag found on the "#ifndef" line with the name of the
	    "#include" file.

	2.  When processing an "#include" statement, if the file has
	    an associated tag as defined in 1) above, and the tag is
	    defined (in the sense of "#define") the file shall not be
	    included.

This is a completely compatible solution to the problem.  Old compilers
will compile include files written with the "ifndef" convention correctly,
but slowly, and new compilers will do it faster.  No standardization
action is required.  Any implementor can install this now, and speed up
their product.

      One could argue for a more elegant but less compatible solution, but
the political hassles aren't worth it.

					John Nagle

tada@athena.mit.edu (Michael J Zehr) (10/24/89)

In article <14240@well.UUCP> nagle@well.UUCP (John Nagle) writes:
>[problem of multiple exclusion, generally solved by:]
>	#ifndef XXX
>	#define XXX
>	...content...
>	#endif
>This works, but on the second inclusion, the file still has to be read and
>parsed, at least by the level of processing that reads "#" statements.
>[slowing compilation, particularly in C++ programs]

>[proposed compiler optimization binding the 'XXX' defined above to the
>included file and not including it a second time.]

there's another solution you can manage without having to get your
compiler vendor to make such an extension:

<foo.h:>
#ifndef FOO
#define FOO
#include "foo_real.h"
#endif

admittedly there's an extra file open for the first time reading
through, and you still have to open files when #including the second
time, but it's not as bad as having to read the entire file in
(particularly if you have some really *large* files).  library files
could be changed to this with relatively little effort and no compiler
changes, and users' header files for large systems could be changed to
work faster *today* without having to wait for the vendors to decide to
do this.

(unless i'm missing something obvious and being really stupid this
works....)

-michael j zehr

rsalz@bbn.com (Rich Salz) (10/24/89)

Gee, if
	#ifndef _HAVE_FOO_H_
	... contents of foo.h ...
	#define _HAVE_FOO_H_
	#endif	/* _HAVE_FOO_H_ */

is too slow, then do this:
	#ifndef _HAVE_FOO_H_
	#include <hidden_real_name_of_foo.h>
	#define _HAVE_FOO_H_
	#endif	/* _HAVE_FOO_H_ */

John's rules about the first #ifndef after the first comment sound much
too complicated for me -- I have enough problem with election day...
	/r$
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.
Use a domain-based address or give alternate paths, or you may lose out.

crowl@cs.rochester.edu (Lawrence Crowl) (10/24/89)

In article <14240@well.UUCP> nagle@well.UUCP (John Nagle) notes three
solutions to the multiple file inclusion problem:

(1) An include file protects itself via #ifndef on a symbol that it defines.
    This causes the file to be read multiple times.

(2) Modifying the semantics of #include to only include a file once.  This is
    incompatible with the current definition of #include.

(3) His proposal for modifying the implementation of #include to recognize the
    idiom in (1).  Implementations need not read a file twice.

However, a solution to the multiple inclusion problem already exists.  It is
not necessary (nor desireable in my opinion) to modify #include semantics or
implementations.  The solution is as follows.

Each include file defines a symbol (preferably related to its name).  For
example, in foo.h:

        #define foo_h
        ...

Each file that includes foo.h, protects the inclusion with a #ifndef:

        ...
        #ifndef foo_h
        #include "foo.h"
        #endif foo_h
        ...

Programmers of include files can further protect against multiple inclusion
by using the standard mechanism:

        #ifndef foo_h
        #define foo_h 
        ...
        #endif foo_h

This solution has the following properties:
- The compiler reads include files exactly once.
- No modifications to current systems are required.
- Includes are three lines long instead of one.
- A small (really) amount of additional programmer effort is required.

I have used this solution as a matter of course since shortly after I learned
to program in C.  It is an obvious solution, and leads me to wonder why it is
not common practice.  Any explainations?
-- 
  Lawrence Crowl		716-275-9499	University of Rochester
		      crowl@cs.rochester.edu	Computer Science Department
...!{allegra,decvax,rutgers}!rochester!crowl	Rochester, New York,  14627

hascall@atanasoff.cs.iastate.edu (John Hascall) (10/24/89)

In some article with a silly huge id#, Lawrence Crowl writes:
}In article <14240@well.UUCP> nagle@well.UUCP (John Nagle) notes three
}solutions to the multiple file inclusion problem:
 
}(2) Modifying the semantics of #include to only include a file once.  This is
}    incompatible with the current definition of #include.
 
}  [yet another scheme...]

    Since the impending ANSI standard requires that including a file more
    than once have exactly the same effect as including it once...why can't
    a compiler just ignore #includes for files it has already #included???
    (at least for the "standard" includes)
    
    Any comments from the ANSI mavens?


    I know some people use stuff like:

 main.c:                              subr.c:
    #define MAINDEF 1                    #define MAINDEF 0
    #include "vars.h"                    #include "vars.h"
    main()                               subrtn()
    { ...                                { ...

    but, does anyone do this thing in the *same* file??

John Hascall

tneff@bfmny0.UU.NET (Tom Neff) (10/24/89)

In article <1659@atanasoff.cs.iastate.edu> hascall@atanasoff.UUCP (John Hascall) writes:
>    Since the impending ANSI standard requires that including a file more
>    than once have exactly the same effect as including it once...why can't
>    a compiler just ignore #includes for files it has already #included???
>    (at least for the "standard" includes)

Including standard HEADER files should be idempotent.  Back here in the
real world there are plenty of uses for including a file multiple times
with a desired substantial effect on each inclusion.  Examples include
program generated data tables, copyright strings, and machine dependent
code sequences.  Any compiler that unconditionally ignored an include file
on the second mention would be horribly broken.
-- 
"My God, Thiokol, when do you      \\    Tom Neff
want me to launch?  Next April?"   \\    tneff@bfmny0.UU.NET

ken@cs.rochester.edu (Ken Yap) (10/24/89)

|(1) An include file protects itself via #ifndef on a symbol that it defines.
|    This causes the file to be read multiple times.

But is this really as inefficient as people think? I tried the
following on a Sun-4/60

% wc grammar0.cc
     932    2944   19700 grammar0.cc
% g++ -I../h -E grammar0.cc | wc
    3728    8219   63497
% time g++ -I../h -E grammar0.cc > /tmp/foo.cc
0.4u 0.3s 0:01 44% 0+208k 0+9io 0pf+0w

Looks pretty insignificant compared to parsing and CG time.

Just to prove that multiple inclusions were attempted

% grep '#' /tmp/foo.cc | sort +2 -3 | uniq -2 -c
   2 # 1 "../h/pg_types.h" 1
   1 # 42 "../h/pg_types.h"
   1 # 1 "/usr/su/lib/g++-include/BitSet.h" 1
   2 # 28 "/usr/su/lib/g++-include/BitSet.h" 2
   1 # 1 "/usr/su/lib/g++-include/File.h" 1
   3 # 27 "/usr/su/lib/g++-include/File.h" 2
   1 # 1 "/usr/su/lib/g++-include/builtin.h" 1
   3 # 48 "/usr/su/lib/g++-include/builtin.h" 2
   1 # 1 "/usr/su/lib/g++-include/math.h" 1
   1 # 126 "/usr/su/lib/g++-include/math.h" 2
   2 # 1 "/usr/su/lib/g++-include/std.h" 1
   1 # 225 "/usr/su/lib/g++-include/std.h"
   2 # 1 "/usr/su/lib/g++-include/stddef.h" 1
   1 # 59 "/usr/su/lib/g++-include/stddef.h"
   1 # 1 "/usr/su/lib/g++-include/stdio.h" 1
   2 # 1 "/usr/su/lib/g++-include/stream.h" 1
   1 # 160 "/usr/su/lib/g++-include/stream.h"
   1 # 27 "/usr/su/lib/g++-include/stream.h" 2
   2 # 1 "/usr/su/lib/g++-include/values.h" 1
   2 # 92 "/usr/su/lib/g++-include/values.h"
   1 # 1 "error.h" 1
   2 # 1 "grammar.h" 1
   2 # 11 "grammar.h" 2
   1 # 127 "grammar.h"
   4 # 13 "grammar.h" 2
   1 # 1 "grammar0.cc"
  10 # 1 "grammar0.cc" 2
   2 # 1 "item.h" 1
   1 # 98 "item.h"
   2 # 1 "option.h" 1
   1 # 40 "option.h"
   1 # 1 "pg.h" 1
   2 # 10 "pg.h" 2
   1 # 1 "pggram.h" 1
   1 # 1 "predict.h" 1
   1 # 9 "predict.h" 2
   1 # 1 "production.h" 1
   3 # 10 "production.h" 2
   5 # 1 "symbol.h" 1
   1 # 10 "symbol.h" 2
   4 # 140 "symbol.h"
   1 # 9 "symbol.h" 2
   4 # 1 "symset.h" 1
   2 # 10 "symset.h" 2
   3 # 84 "symset.h"
   1 # 9 "symset.h" 2
   1 # 1 "symtab.h" 1
   1 # 9 "symtab.h" 2
   3 # 1 "termset.h" 1
   2 # 50 "termset.h"
   1 # 9 "termset.h" 2

Some of the .h files are pretty hefty, as you can see from the size of
the expanded source.

I don't think I will lose any sleep over what cpp is doing.

leo@duttnph.tudelft.nl (Leo Breebaart) (10/24/89)

The problem of managing include-files for c++ libraries is also
discussed in the following paper:

Managing C++ Libraries
James M. Coggins & Gregory Bollella
SIGPLAN Notices, Vol. 24, No. 6

I think it is a very interesting article, not in the least because they
explain about the +e0/+e1 options, which I didn't know about, and which
has since saved me oodles of disk space and link time.

Leo Breebaart (leo @ duttnph.tudelft.nl)

meissner@dg-rtp.dg.com (Michael Meissner) (10/24/89)

In article <1659@atanasoff.cs.iastate.edu>
hascall@atanasoff.cs.iastate.edu (John Hascall) writes:

>      Since the impending ANSI standard requires that including a file more
>      than once have exactly the same effect as including it once...why can't
>      a compiler just ignore #includes for files it has already #included???
>      (at least for the "standard" includes)
>      
>      Any comments from the ANSI mavens?

Only the include files specified by standard (stdio.h, string.h, etc.)
are required to work when included multiple times (ie, there must be
some sort of guard around parts that can not be redeclared, like
typedefs and structures).  No such requirement is mandated for any
other include file.
--

Michael Meissner, Data General.				If compiles where much
Uucp:		...!mcnc!rti!xyzzy!meissner		faster, when would we
Internet:	meissner@dg-rtp.DG.COM			have time for netnews?

johnc@plx.UUCP (John C.) (10/24/89)

I also use the multiple-inclusion solution presented by one of the
participants in the discussion, with a minor optimization.  My version:

1) All include files have the following format:

/* file foo.h */
#ifndef FOO_H
#define FOO_H
...contents of foo.h...
#endif

(For "standard" include files, you pick one of the symbols defined in
that file as its tag, say "PASCAL" for Microsoft's WINDOWS.H header.)

2) All _include files_ that include other headers do so as follows:

...
#ifndef BAR_H
#include <bar.h>  /* ...or "bar.h", as appropriate */
#endif
...

3) [The optimization]
It is *not* necessary to do (2) in .C files, only in .H files, because
a .C file is never multiply included (nobody #include's .C files,
right? right?? ).  So, .C files just #include without the enclosing
#ifndef/#endif.


Other remarks:

a) Plexus develops Microsoft Windows programs.  Microsoft's main 
header for Windows programming in C, WINDOWS.H, is about 80 KBytes!
We saw at least a 2x compilation speedup when we instituted
practice (2) above, preventing the re-scanning of previously-included
headers.  WINDOWS.H was being scanned up to five times in certain cases!
This may not affect compilation speed in some environments, but it
certainly does in MS-DOS.

b) Regarding modifications to the semantics of #include (so the compiler
would keep track of which file names had already been scanned), you
might be interested to know that MetaWare's compilers have had a
"conditional include" (C_Include) directive for several years now,
which does exactly this.  Perhaps the preferred language change, if 
any, is to define such a "conditional-include" directive, perhaps 
"#cond_include" or "#c_include".  [How about it, Dr. Stroustrup?]

[My preference is actually for a "compiled definitions" approach,
a la Mesa or Modula-2...]

c) Finally, in response to the argument that it's useful for the second
and subsequent inclusion of a file to have a different (and non-null)
effect, I feel *strongly* that this is an unmaintainable practice
and that there are probably better ways to do anything this is used
to accomplish.

/John Ciccarelli  [...sun!plx!johnc, or johnc@plx.uucp]

davidm@uunet.UU.NET (David S. Masterson) (10/24/89)

In article <14240@well.UUCP> nagle@well.UUCP (John Nagle) writes:

   (Many newer compilers do "#" processing in the same pass as main
   compilation, so referring to a "preprocessor" in this context is not
   necessarily correct.)

I suppose that this is a performance enhancement and that now "#" processing
is being considered so much a part of the language that compiler writers feel
justified in doing this.  However, doesn't this limit flexibility?  Because of
this, the problem being discussed results in having to change the compiler as
opposed to changing the (probably more simple) preprocessor.

--
===================================================================
David Masterson					Consilium, Inc.
uunet!cimshop!davidm				Mt. View, CA  94043
===================================================================
		"Nobody here but us chickens..."

davidm@uunet.UU.NET (David S. Masterson) (10/25/89)

In article <1989Oct23.191634.6345@cs.rochester.edu> crowl@cs.rochester.edu (Lawrence Crowl) writes:

   [discussion of standard include mechanism]

   - A small (really) amount of additional programmer effort is required.

Yeh, but programmers are notoriously lazy.  Computers do stuff like this far
better than humans do, once they've been taught how.  Perhaps its time to
teach the computers.

--
===================================================================
David Masterson					Consilium, Inc.
uunet!cimshop!davidm				Mt. View, CA  94043
===================================================================
		"Nobody here but us chickens..."

davidm@uunet.UU.NET (David S. Masterson) (10/25/89)

In article <1659@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu (John Hascall) writes:

       Since the impending ANSI standard requires that including a file more
       than once have exactly the same effect as including it once...why can't
       a compiler just ignore #includes for files it has already #included???
       (at least for the "standard" includes)

       Any comments from the ANSI mavens?

What????  (I hope this isn't true)

Doesn't the "#if" mechanism suggest that it is very possible to include a file
more than once and have different things happen?  For instance (not a really
good example, though):

setup.h:				main.c:
#ifdef GENERIC				#define GENERIC
#define INTSIZE 16			#include "setup.h"
#endif
#ifdef VAX				processor.c
#define INTSIZE 32			#define VAX
#endif					#include "setup.h"

--
===================================================================
David Masterson					Consilium, Inc.
uunet!cimshop!davidm				Mt. View, CA  94043
===================================================================
		"Nobody here but us chickens..."

@read.columbia.edum) (f) (10/25/89)

The best solution of all is to (conceptually) get rid of separate
compilation.  One might define a "project" file which is basically a
list of the .c, .lib, and .h files that make up a program.  The C compiler
would know about this project file; you no longer compile a file, you
compile a whole project.  

Conceptually, with a project structure, all the non-static function and
variable definitions are available to any file in the project.  No 
#include statements are needed.  The compiler can precompile definitions
etc... to implement separate compilation, but all of that is an implementation
detail.  For backwards compatibility one can define a "file interchange 
format" which mainly requires the compiler to generate a set of files
with the proper includes.  

Lightspeed C for the macintosh has a small part of this system.  You define
a project by listing the .c and .lib files; it scans the .c files for
the includes to automatically construct a makefile for you.  

-steve
(kearns@cs.columbia.edu)

sartin@hplabsz.HPL.HP.COM (Rob Sartin) (10/25/89)

In article <1659@atanasoff.cs.iastate.edu> hascall@atanasoff.UUCP (John Hascall) writes:
>    Since the impending ANSI standard requires that including a file more
>    than once have exactly the same effect as including it once...why can't
>    a compiler just ignore #includes for files it has already #included???
>    (at least for the "standard" includes)

That introduces some potential oddities for <assert.h> which doesn't try
to prevent from being included twice.  My C++ 2.0 assert.h includes a
comment that says:

/* This header file intentionally has no wrapper, since the user
*  may want to re-include it to turn off/on assertions for only
*  a portion of the source file.
*/

>    but, does anyone do this thing in the *same* file??

=begin foo.c
#define NDEBUG
#include <assert.h>
int tested_function()
{
...
}

#undef NDEBUG
#include <assert.h>
int untested_function()
{
}
=end foo.c

Rob Sartin			internet: sartin@hplabs.hp.com
Software Technology Lab 	uucp    : hplabs!sartin
Hewlett-Packard			voice	: (415) 857-7592

gwyn@smoke.BRL.MIL (Doug Gwyn) (10/25/89)

In article <14240@well.UUCP> nagle@well.UUCP (John Nagle) writes:
>      It's been proposed that the semantics of "#include" be changed to 
>avoid all multiple inclusion.  But this is controversial, and would require
>ANSI approval.

It's not especially controversial, because as you imply it would be a
change to a well-defined characteristic of the C language.  Thus when
it was proposed to X3J11, we had little difficulty in determining that
the proposal must be rejected.  Several people attested to the fact
that they have existing code that requires the existing semantics.

>      I propose a solution via compiler optimization.

Your solution does not at all seem to me to preserve existing semantics.

shap@delrey.sgi.com (Jonathan Shapiro) (10/25/89)

Why does everybody feel compelled to reinvent this wheel?

The current most widley accepted solution for single inclusion is to
insert a pragma into the header file:

   #pragma once

Jonathan Shapiro
Synergistic Computing Associates

shap@delrey.sgi.com (Jonathan Shapiro) (10/25/89)

Okay, here's another cute trick.

Have an include file called include-files.h, which contains things
like

   #define FRED_H "fred.h"
   #define WILMA_H "wilma.h"

include them by doing the following in all source files:

   #include "include-files.h"  // done once in all source files
  
   #include FRED_H
   #include ...

inside FRED_H do the following:

   #ifdef FRED_H
   #undef FRED_H
   #define FRED_H /dev/null
   #endif

This works, doesn't require much overhead, and can be automatically
done applied to existing code by a fairly simple shell script.

Jonathan S. Shapiro
Synergistic Computing Associates

schwartz@psuvax1.cs.psu.edu (Scott Schwartz) (10/25/89)

One instance of prior art is found in gcc, which uses ``#pragma once''
to get the desired effect.

--
Scott Schwartz		<schwartz@shire.cs.psu.edu>
Now back to our regularly scheduled programming....

bph@buengc.BU.EDU (Blair P. Houghton) (10/25/89)

How about ( in <foo.h>):

	#pragma Never_Again

Which tells the BlairTech/ANSI1.0 compiler that when it hits
a #include to read this file again, it should just ignore it.

				--Blair
				  "Which is what the original
				   poster was saying, only in an
				   'I want it standardized, maybe'
				   manner..."

dld@F.GP.CS.CMU.EDU (David Detlefs) (10/25/89)

There have been a lot of posts on this subject ever since John Nagle
made the original post.  I believe that most of them miss the point.

The various objections that have been made, and responses to them:

1) It's bad practice to have non-idempotent include files.

Answer) Sure, I think I agree, but there are programs that use them.

2) The Ansi standard will mandate single-inclusion semantics (John Hascall.)

Answer) As has been pointed out (Tom Neff, Michael Meissner), only for
explicitly specified library include files.  Others must obey current
semantics.

3) You could a) move the #ifndef/#endif into the *including* file
(Lawrence Crowl, John Ciccarelli, Peter da Silva), or b) add a new
"#include-once" or "#pragma once" (Scott Schwartz, Blair Houghton,
Jonathon Shapiro) directive to the preprocessor and use that to get
the same effect.

Answer) Sure you could, but Nagle's proposal gets you optimal
performance with *no* semantic changes, and *no* extra programmer
work.

4) You could write your include files in 2 parts, a "wrapper" that
only includes the "real" file once. (Michael J. Zehr, MIT, Rich Salz)

Answer) I believe you will find that a significant proportion of the
cost of multiple includes is embodied in the disk seeks necessary
to find the file just to open it.  This would imply that this scheme
will not save as much as you'd like.  But this is a guess on my part,
subject to empirical test.  Does the proponent of the wrapper scheme
care to run the comparison?  In any case, observe that Nagle's scheme
performs *optimally* with no extra programmer effort of this kind.

5) The performance difference may not be worth worrying about (Ken
Yap.)

Answer) Often true, but in the cases where it's not true, it can be
very not true (My own experience, John Ciccarelli).

**** The point of Nagle's suggestion is that

1) There is an existing very often used method for getting
include-once *semantics*, viz., writing include files in the form

#ifndef <FOO>
#define <FOO> 1
<body>
#endif

2) He presents a technique that improves performance without modifying
semantics in *any way.*  (Do you know how rare such an idea is!?!)

Conclusion: IMHO, anybody who's read Nagle's post and implements a C
preprocessor (or compiler that incorporates one) and doesn't use the
technique doesn't recognize a good thing when it walks up and sits in
his/her lap.  Even if your CPP incorporates a mechanism such as
#pragma once, this will still help if the compiler is used on any of
the vast existing body of code that doesn't use #pragma once.

Dave
--
Dave Detlefs			Any correlation between my employer's opinion
Carnegie-Mellon CS		and my own is statistical rather than causal,
dld@cs.cmu.edu			except in those cases where I have helped to
				form my employer's opinion.  (Null disclaimer.)

henry@utzoo.uucp (Henry Spencer) (10/25/89)

In article <11396@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>>      I propose a solution via compiler optimization.
>
>Your solution does not at all seem to me to preserve existing semantics.

Can you elaborate on this, Doug?  Seems to me like what he was proposing --
have compiler recognize files bracketed with `#ifndef FOO_H' and remember
the bracketing -- comes under the "as if" rule.  Re-including such a file
with FOO_H defined cannot possibly have any effect except to slow down
the compilation.
-- 
A bit of tolerance is worth a  |     Henry Spencer at U of Toronto Zoology
megabyte of flaming.           | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

bright@Data-IO.COM (Walter Bright) (10/26/89)

In article <CIMSHOP!DAVIDM.89Oct24095658@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes:
>In article <14240@well.UUCP> nagle@well.UUCP (John Nagle) writes:
<   (Many newer compilers do "#" processing in the same pass as main
<   compilation, so referring to a "preprocessor" in this context is not
<   necessarily correct.)
<I suppose that this is a performance enhancement and that now "#" processing
<is being considered so much a part of the language that compiler writers feel
<justified in doing this.  However, doesn't this limit flexibility?

Integrating the preprocessor into the parser is a big performance win
because you save a file write and read of the source. Loading and starting
one program is also faster than loading and starting two. The MS-DOS
compiler market is such that an integrated preprocessor is pretty much
required.

An integrated preprocessor has an advantage usually because it knows about
types and typedefs, so sizeof expressions can be used in #if expressions
(though ANSI C disallows it).

What you lose is the ability to run the preprocessor on non-C source files.
Note that integrating the preprocessor and the parser does not prevent
you from viewing the preprocessed output, there is usually a switch to
generate this.

peter@ficc.uu.net (Peter da Silva) (10/26/89)

In article <2185@dataio.Data-IO.COM> bright@dataio.Data-IO.COM (Walter Bright) writes:
> What you lose is the ability to run the preprocessor on non-C source files.

Well, that writes off Imakefile. Oh well, X on an IBM is a lost cause.
-- 
Peter da Silva, *NIX support guy @ Ferranti International Controls Corporation.
Biz: peter@ficc.uu.net, +1 713 274 5180. Fun: peter@sugar.hackercorp.com. `-_-'
"That particular mistake will not be repeated.  There are plenty of        'U`
 mistakes left that have not yet been used." -- Andy Tanenbaum (ast@cs.vu.nl)

dave@charyb.COM (Dave Rifkind) (10/26/89)

Just out of interest, would something like this do the job?

     #pragma abandon _FILENAME_H
     #ifndef _FILENAME_H
     #define _FILENAME_H
     ...
     #endif /* _FILENAME_H */

...where "#pragma abandon <name>" means "immediately terminate
processing this file if <name> is #defined"?

nagle@well.UUCP (John Nagle) (10/27/89)

In article <1989Oct24.060920.28655@cs.rochester.edu> ken@cs.rochester.edu writes:
->But is this really as inefficient as people think? I tried the
->following on a Sun-4/60
->
->% wc grammar0.cc
->     932    2944   19700 grammar0.cc
->% g++ -I../h -E grammar0.cc | wc
->    3728    8219   63497
->% time g++ -I../h -E grammar0.cc > /tmp/foo.cc
->0.4u 0.3s 0:01 44% 0+208k 0+9io 0pf+0w
->
->Looks pretty insignificant compared to parsing and CG time.

       What are you comparing to what?  Only one time measurement is given.
This makes it rather meaningless to draw any conclusions.


					John Nagle

dspoon@ncratl2.Atlanta.NCR.COM (Dave Witherspoon) (10/27/89)

In article <950@dutrun.UUCP>, leo@duttnph.tudelft.nl (Leo Breebaart) writes:
> The problem of managing include-files for c++ libraries is also
> discussed in the following paper:
> 
> Managing C++ Libraries
> James M. Coggins & Gregory Bollella
> SIGPLAN Notices, Vol. 24, No. 6
> 

I did not see Nagle's orignal posting, so I cannot comment on his
proposal.  However, many of the suggestions I read I have already tried
(only to have them fail) and others will not work in the MS-DOS world.
There is one solution we have found that works (hint: Leo knows) in *any*
environment.

Say I have 2 classes, A and B.  A contains a B*, and B contains an A*.
Thus, we have a circular dependency!  One proposal (that I've tried):

A.hpp                                B.hpp
-----                                -----
#ifndef D_A                          #ifndef D_B
#define D_A                          #define D_B
#include "B.hpp"                     #include "A.hpp"

class A { B* bp;}                    class B {A* ap;}
#endif                               #endif

Now if we compile B.cpp, who does an #include of B.hpp, then an
error will emerge.  The way it happens is:
	1. D_B isn't #defined, so #define it
	2. Hop into A.hpp
	3. D_A isn't #defined, so #define it
	4. Hop into B.hpp (again)
	5. D_B *is* #defined (here's the rub...we haven't seen class B yet)
	6. Cruise on into class A
	7. Hey...what the heck's a B?

Bummer...that didn't work.  How's about moving the #define of D_A and
D_B behind the class right before the #endif...No cigar.  Compiling
B.cpp again:
	1. D_B isn't defined, so...
	2. Hop into A.hpp
	3. D_A isn't defined, so...
	4. Hop into B.hpp
	5. D_B isn't defined, so...
	6. Hop into etc., etc., etc.

Dang.  Foiled again.  Let's try this:

A.hpp                                B.hpp
-----                                -----
#ifndef D_A                          #ifndef D_B
#define D_A                          #define D_B

#ifndef D_B                          #ifndef D_A
#include "B.hpp"                     #include "A.hpp"
#endif                               #endif

class A { B* bp;}                    class B {A* ap;}
#endif                               #endif

Compiling B.cpp again, we will see the same result as the first
method...by the time we see B* in class A, class B has not yet been
defined.  

#pragma "once" isn't supported in MSC.  Preprocessor proposals are
frequent but rarely acted upon amongst all the major vendors.  Besides, 
I need something NOW!

We have fully implemented Coggins' proposal and it works great.  I could
explain in detail, but I'll give a brief explanation.  There is one file
(prelude.hpp) that checks for the #definition of each type of object in the
system, and that file guarantees that *all* the necessary .hpp files are
#included _AND_ it guarantees that they are #included in the CORRECT ORDER
for the specific object being compiled.  We have 140 classes, so this
method takes all our troubles away.  An example:

B depends on A and A2, A depends on Z, Z depends on Y.

B.d              B.hpp          B.cpp                     prelude.hpp
---              -----          -----                     -----------
#define D_A      class B {      #define D_B               ...magic...
#define D_A2       :            #include "prelude.hpp"
		   :
                 };

The new one there is B.d.  It explicitly declares all of B's first-level
dependencies.  You have to do this anyway, but this way it is clear and
explicit.  A.d would mention his Z dependency, and Z.d would mention Y.
When you compile B.cpp, prelude.hpp is opened ONCE.  D_Y, D_Z, D_A,
D_A2, and D_B are found to be #defined, and their respective .hpp files
are #included IN THAT ORDER.  Each .hpp file is opened precisely ONCE.

Sorry to take so much space.  If you want more info, contact me.
Counsel rests.

-------------------------------David Witherspoon-------------------------------
D.Witherspoon@Atlanta.NCR.COM         | "Dolphins find humans amusing, but 
NCR Retail Systems Development/Atlanta|  they don't want to talk to them."
MY OPINIONS...ALL MINE!!!             |               - David Byrne

diamond@csl.sony.co.jp (Norman Diamond) (10/27/89)

Someone:

>>>      I propose a solution via compiler optimization.

In article <11396@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:

>>Your solution does not at all seem to me to preserve existing semantics.

In article <1989Oct25.164145.29980@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:

>Can you elaborate on this, Doug?  Seems to me like what he was proposing --
>have compiler recognize files bracketed with `#ifndef FOO_H' and remember
>the bracketing -- comes under the "as if" rule.  Re-including such a file
>with FOO_H defined cannot possibly have any effect except to slow down
>the compilation.

I mostly agree with Mr. Spencer.  However, perhaps we need an additional
interpretation ahead of this one.  Is inclusion a "volatile" operation?

Hmm.  In fact, if a program is being interpreted instead of compiled,
the program itself can be responsible for its include files being
changed between two successive inclusions.  Now maybe C will finally
replace LISP.  :-) :-)

-- 
Norman Diamond, Sony Corp. (diamond%ws.sony.junet@uunet.uu.net seems to work)
  Should the preceding opinions be caught or     |  James Bond asked his
  killed, the sender will disavow all knowledge  |  ATT rep for a source
  of their activities or whereabouts.            |  licence to "kill".

ken@cs.rochester.edu (Ken Yap) (10/27/89)

|->But is this really as inefficient as people think? I tried the
|->following on a Sun-4/60
|->
|->% wc grammar0.cc
|->     932    2944   19700 grammar0.cc
|->% g++ -I../h -E grammar0.cc | wc
|->    3728    8219   63497
|->% time g++ -I../h -E grammar0.cc > /tmp/foo.cc
|->0.4u 0.3s 0:01 44% 0+208k 0+9io 0pf+0w
|->
|->Looks pretty insignificant compared to parsing and CG time.
|
|       What are you comparing to what?  Only one time measurement is given.
|This makes it rather meaningless to draw any conclusions.

Sorry, sloppy of me:

% time g++ -I../h -c grammar0.cc
10.5u 2.5s 0:22 57% 0+1532k 34+37io 202pf+0w

5% of the total time is not something I care to worry about at this time.

dog@cbnewsl.ATT.COM (edward.n.schiebel) (10/27/89)

From article <1030@ncratl2.Atlanta.NCR.COM>, by dspoon@ncratl2.Atlanta.NCR.COM (Dave Witherspoon):
> [...stuff deleted]
> Say I have 2 classes, A and B.  A contains a B*, and B contains an A*.
> Thus, we have a circular dependency!  One proposal (that I've tried):
> 
> A.hpp                                B.hpp
> -----                                -----
> #ifndef D_A                          #ifndef D_B
> #define D_A                          #define D_B
> #include "B.hpp"                     #include "A.hpp"
> 
> class A { B* bp;}                    class B {A* ap;}
> #endif                               #endif
> 
Since class A only has a B*, and not a B you don't need to #include B.hpp
but only tell the compiler B is a class. Thus:

#ifndef D_A                          #ifndef D_B
#define D_A                          #define D_B

class B;                             class A;	// DON'T NEED INCLUDE!
 
class A { B* bp;}                    class B {A* ap;}
#endif                               #endif

Now, if A included an instance of a B and visa-versa, well, then
you have problems.

	Ed Schiebel
	AT&T Bell Laboratories
	dog@vilya.att.com
	201-386-3416

vaughan@mcc.com (Paul Vaughan) (10/27/89)

   From: dspoon@ncratl2.Atlanta.NCR.COM (Dave Witherspoon)

   Say I have 2 classes, A and B.  A contains a B*, and B contains an A*.
   Thus, we have a circular dependency!  One proposal (that I've tried):

This may not solve your exact problem, but the way to do this is
simply

in "A.h"

class B;
class A {
  B* b;
};

and in "B.h"

class A;
class B {
  A* a;
};

	However, this does bring up a real issue.  This sort of thing
works, but not when you try to define inline functions inside the .h
files.  Putting inlines in .h files requires the inclusion of other .h
files, that would not otherwise be necessary.  Since all base class
and component types must be defined before a class definition, a
single inline function can force a whole tree of .h files to be
included, (easily 10 more files) where they would not have been
necessary otherwise.  Of course, there is no choice when working with
ATT's cfront, because inline functions must be defined within the
class definition.  Also, with GNU g++, inline functions must be
defined before they are used, or else they simply can't be inlined.
However, at least with the g++ approach, one might structure the files
differently, perhaps, including a .inlines.cc file in those files that
really need to call inline methods and have them inlined.

Does anybody have a reasonable solution for decoupling the .h file
dependencies introduced by inlines for either g++ or c++?

 Paul Vaughan, MCC CAD Program | ARPA: vaughan@mcc.com | Phone: [512] 338-3639
 Box 200195, Austin, TX 78720  | UUCP: ...!cs.utexas.edu!milano!cadillac!vaughan

shap@delrey.sgi.com (Jonathan Shapiro) (10/28/89)

In article <1030@ncratl2.Atlanta.NCR.COM> dspoon@ncratl2.Atlanta.NCR.COM (Dave Witherspoon) writes:
>
>Say I have 2 classes, A and B.  A contains a B*, and B contains an A*.
>Thus, we have a circular dependency!  One proposal (that I've tried):
>

The following should do what you want.  The trick is to recognize that
you don't really have a circular dependency at all:

A.hpp                           B.hpp
-----                           -----

#ifndef A_HPP                   #ifndef B_HPP
#define A_HPP                   #define B_HPP

class B;                        class A;

class A { B* bp; };             class B {A* ap;};


All C++ needs to know is that A/B are classes to construct a pointer
to them.  It doesn't need to know what the contents are until you go
to use them.

The only disadvantage to this approach is that A's inline member
functions won't get inlined into the B inline member functions that
are defined in the header, and vice versa.

This sort of problem is tricky.  If you try to do what you were doing,
it is very easy for a 60 line implementation file to haul in 6000
lines wirth of headers.  I've seen it happen.

gwyn@smoke.BRL.MIL (Doug Gwyn) (10/28/89)

In article <1989Oct25.164145.29980@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
-Can you elaborate on this, Doug?  Seems to me like what he was proposing --
-have compiler recognize files bracketed with `#ifndef FOO_H' and remember
-the bracketing -- comes under the "as if" rule.  Re-including such a file
-with FOO_H defined cannot possibly have any effect except to slow down
-the compilation.

I received some private correspondence on this also, and apparently I
didn't grasp the actual meat of the proposal.

I suppose if further #undef/#define of the identifier were properly
tracked, it would work.

jeffa@hpmwtd.HP.COM (Jeff Aguilera) (10/28/89)

Here's my approach for preventing superfluous includes, without relying on the
`implementation-defined' #pragma directive, modifications to #include semantics,
or an impressively intelligent compiler.  

A module is divided into three source files:

    .h  [header]    interface definition; inlines prohibited
    .m  [macro]     inlines; one-liners and common idioms only
    .c  [c]         C++ source

Each .[hmc] file normally includes just one other file, a corresponding 
preprocessor file .[hmc]pp:

    //foo.h -- a foo gadget
    #include "foo.hpp"
    <interface>

    //foo.m -- a foo gadget
    #include "foo.mpp"
    <inlines>

    //foo.c: foo.h -- a foo gadget
    #include "foo.cpp"
    <C++ implementation>

These preprocessor files are generated automatically from the source by
inspecting the first line, and extracting comments that look like this:

    //use: "foo.h bar.h"
    //use: <string.h iostream.h X11/Xlib.h>

The `extract .?pp' program (xpp) generates files that look like this:

    //foo.hpp
    #ifndef foo_h
    #define foo_h "Fri Oct 27 10:25:37 PDT 1989 foo.h"
    #endif

    //foo.mpp
    #ifndef _memory_h
    #include <memory.h>
    #ifndef _memory_h
    #define _memory_h
    #endif
    #endif

    //foo.cpp
    #define self (*this)
    #include "foo.h"
    #include "foo.m"
    #ifndef _string_h
    #include <string.h>
    #ifndef _string_h
    #define _string_h
    #endif
    #endif
    #ifndef _iostream_h
    #include <iostream.h>
    #ifndef _iostream_h
    #define _iostream_h
    #endif
    #endif
    #ifndef _X11_Xlib_h
    #include <X11/Xlib.h>
    #ifndef _X11_Xlib_h
    #define _X11_Xlib_h
    #endif
    #endif
    #ifndef bar_h
    #include "bar.h"
    #include "bar.m"
    #endif

Xpp also extracts comments of the form //bug: and colates the results in a
correspondind `bug-bucket' file, .?bb:

    //foo.cbb
    Known bugs as of Fri Oct 27 10:31:07 PDT 1989
      45:   X& X::operator=(const X&) not implemented
      87:   should raise software assertion rather than aborting
     134:   EOF[3CB7] 2599 characters 

Xpp also forces a file naming policy, so that porting applications between UNIX
and DOS is only a minor nuisance.  (Case sensitivity, length, character set,
extension.)  Still no topological sort on include dependencies....

--------
    jeff "reinventing rounder wheels" aguilera

rfg@ics.uci.edu (Ron Guilmette) (10/28/89)

In article <DLD.89Oct25104101@F.GP.CS.CMU.EDU> dld@F.GP.CS.CMU.EDU (David Detlefs) writes:
>Conclusion: IMHO, anybody who's read Nagle's post and implements a C
>preprocessor (or compiler that incorporates one) and doesn't use the
>technique doesn't recognize a good thing when it walks up and sits in
>his/her lap.  Even if your CPP incorporates a mechanism such as
>#pragma once, this will still help if the compiler is used on any of
>the vast existing body of code that doesn't use #pragma once.

Regarding that "vast existing body of code", I have a little csh script
for you:

	#!/bin/csh

	foreach file (*.h)
		cp $file /tmp/$file
		rm -f $file
		echo #pragma once > $file
		cat /tmp/$file >> $file
		rm -f /tmp/$file
	end

If you don't grok csh, I will interpret.

// rfg

rfg@ics.uci.edu (Ron Guilmette) (10/28/89)

In article <1030@ncratl2.Atlanta.NCR.COM> dspoon@ncratl2.Atlanta.NCR.COM (Dave Witherspoon) writes:
>
>#pragma "once" isn't supported in MSC.  Preprocessor proposals are
>frequent but rarely acted upon amongst all the major vendors.  Besides, 
>I need something NOW!

Did you even consider the possibility of just porting the GNU preprocessor
to your system and then using it in conjunction with the MS-C compiler?

// rfg

dhesi@sun505.UUCP (Rahul Dhesi) (10/28/89)

In article <1088@odin.SGI.COM> shap@delrey.sgi.com (Jonathan Shapiro)
writes:

>   #include FRED_H

Please be alert for problems.  K&R requires the token after the
"#include" to be a filename enclosed in double quotes or angle
brackets, not an arbitrary symbol.  It was not until the ANSI C
standard that the generalized syntax was blessed.

Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
UUCP:  oliveb!cirrusl!dhesi
Use above addresses--email sent here via Sun.com will probably bounce.

marc@dumbcat.UUCP (Marco S Hyman) (10/29/89)

In article <1011@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes:
    In article <1088@odin.SGI.COM> shap@delrey.sgi.com (Jonathan Shapiro)
    writes:
    
    >   #include FRED_H
    
    Please be alert for problems.  K&R requires the token after the
    "#include" to be a filename enclosed in double quotes or angle
    brackets, not an arbitrary symbol.

And the C compiler shipped with System V/386 3.2 (ISC's flavor) coffs
(I couldn't help myself ;-) on that one -- At least it didn't like
#include __FILE__.  (Another reason to use gcc/g++).

--marc
-- 
// Marco S. Hyman		{ames,pyramid,sun}!pacbell!dumbcat!marc

dhesi@sunscreen.UUCP (Rahul Dhesi) (10/30/89)

In article <1087@odin.SGI.COM> shap@delrey.sgi.com (Jonathan Shapiro) writes:
>The current most widley accepted solution for single inclusion is to
>insert a pragma into the header file:
>
>   #pragma once

A serious mistake, because this pragma can affect the meaning of a
program, and therefore cannot be safely ignored.

Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
UUCP:  oliveb!cirrusl!dhesi
Use above addresses--email sent here via Sun.com will probably bounce.

dspoon@scotty.Atlanta.NCR.COM (Dave Witherspoon) (11/01/89)

In article <3531@cadillac.CAD.MCC.COM}, vaughan@mcc.com (Paul Vaughan) writes:
} 
}    From: dspoon@ncratl2.Atlanta.NCR.COM (Dave Witherspoon)
} 
}    Say I have 2 classes, A and B.  A contains a B*, and B contains an A*.
}    Thus, we have a circular dependency!  One proposal (that I've tried):
} 
} This may not solve your exact problem, but the way to do this is
} simply
} 
}  [...solution omitted...]
} 
} 	However, this does bring up a real issue.  This sort of thing
} works, but not when you try to define inline functions inside the .h
} files.  Putting inlines in .h files requires the inclusion of other .h
} files, that would not otherwise be necessary.  Since all base class
} and component types must be defined before a class definition, a
} single inline function can force a whole tree of .h files to be
} included, (easily 10 more files) where they would not have been
} necessary otherwise.  

I didn't go into all the other problems that happen when you don't get
your dependent files included.  A containing a B* made for an easy
example, but (as you mention) additional problems show up when you need
to invoke a B method (inline or not) from an A inline method.  Also if
A contained a B (not a B*), you're bit again.  The Coggins approach I
mentioned solve 3 problems:  (1) many files open, (2) multiple opens of
the same .h file, and (3) correct order of inclusion.

-------------------------------David Witherspoon-------------------------------
D.Witherspoon@Atlanta.NCR.COM         | "Dolphins find humans amusing, but 
NCR Retail Systems Development/Atlanta|  they don't want to talk to them."
MY OPINIONS...ALL MINE!!!             |               - David Byrne

dspoon@scotty.Atlanta.NCR.COM (Dave Witherspoon) (11/01/89)

In article <1144@odin.SGI.COM>, shap@delrey.sgi.com (Jonathan Shapiro) writes:
> In article <1030@ncratl2.Atlanta.NCR.COM> dspoon@ncratl2.Atlanta.NCR.COM (Dave Witherspoon) writes:
> >
> >Say I have 2 classes, A and B.  A contains a B*, and B contains an A*.
> >Thus, we have a circular dependency!  One proposal (that I've tried):
> >
> 
> The following should do what you want.  The trick is to recognize that
> you don't really have a circular dependency at all:
> 
> All C++ needs to know is that A/B are classes to construct a pointer
> to them.  It doesn't need to know what the contents are until you go
> to use them.

If instead my A class contains a B, then the compiler MUST know the full
size of B.  Or if my A class has an inline function that invokes some B
method, then the compiler must see B's class declaration.  I believe in
either of these cases, then it is truly a circular dependency.  In fact,
we even have dependency "triplets".

Other "wrapper" type methods I tried did not solve this problem.
-------------------------------David Witherspoon-------------------------------
D.Witherspoon@Atlanta.NCR.COM         | "Dolphins find humans amusing, but 
NCR Retail Systems Development/Atlanta|  they don't want to talk to them."
MY OPINIONS...ALL MINE!!!             |               - David Byrne