dld@f.gp.cs.cmu.edu (David Detlefs) (11/12/88)
There have been a number of interesting suggestions on how to go about including header files only as needed. Someone suggested including them using something like an #include_once directive. It seems to me that the "include-once-ability" of a header file ought to be a property of the file, independent of the context in which is included. This would favor putting something in the file itself. The "#ifndef/#define/<body>/#endif" style, while not pretty, is certainly sufficient and requires no modifications to cpp. Every so often, someone writes an include file that is really meant to be included multiple times -- it contains #ifdef's that depend an symbols whose values are changed between inclusions. Files of this kind could not be include-once. If we did want to modify cpp, though, we could take care of both situations minimally by recording as we process the inclusion of a file all the preprocessor symbols used in #ifdef's and the values they had at the time of this inclusion. We would store this list in a table, and when we next included the file, check the current values of those symbols. If they are the same as before, don't include the file, otherwise do. It would be nice if you could precompile each header file to get a symbol table (both preprocessor and language tables -- this scheme would require an integrated preprocessor/compiler) that you would dump to a file. Maybe you would call this a ".hx" file. Whenever you include a .h file, check for a corresponding .hx file in the same directory. If you find it, look at the .hx file; in there, you've recorded this same list of external preprocessor symbols it depends on and the values for those that were used when this was compiled. If they agree with the current values for those symbols, then you just read in the file. (I have some ideas about how to dump/undump the symbol table, but that's another discussion.) I think we really need something like this to get decent compilation performance. The project I work on requires all programs written in the the language we're writing to include a file that expands to upwards of 3500 lines of code. Compiles take on the order of 5 minutes on an IBM RT/PC. Not very good. I'd like to see if some scheme like the above could make this better. -- Dave Detlefs Any correlation between my employer's opinion Carnegie-Mellon CS and my own is statistical rather than causal, dld@cs.cmu.edu except in those cases where I have helped to form my employer's opinion. (Null disclaimer.) --
jim@athsys.uucp (Jim Becker) (11/15/88)
From article <3561@pt.cs.cmu.edu>, by dld@f.gp.cs.cmu.edu (David Detlefs): > [stuff deleted] > > It would be nice if you could precompile each header file to get a > symbol table (both preprocessor and language tables -- this scheme > would require an integrated preprocessor/compiler) that you would dump > to a file. [clipped] > Something fairly equivalent to this existed in the Amiga world, and it's net effect was excellent. I found that using this type of approach made development on a 68000/floppy based Amiga faster than a 80286/harddisk system! This was part of the Manx development system (aka - Aztec C). The implementation consisted of having a special option for the compiler that would take the symbol table that it currently had and outputing the content as a "library" file. Not a standard type library, but a library of symbols that were known. This creation I did using a C file that contained #include statements for all the standard include files that I used in the system. They were then bundled into this small binary format library. During subsequent compilation, the created symbol library file was included in the command line. Any include files that were needed from during the compilation process were semantically extracted from the symbol library when they were needed. Include files that were not in the symbol library were processed normally. The result was a DRAMATIC improvement in the compilation speed! I have been surprised that this concept hasn't yet come into other products that I have crossed paths with, as it seems very elegant and nice (w/ not much effort). We just received the Oregon C++ compiler, I hope they may integrate something such as this into their compiler to help further improve the speed of compilation!! > > -- > Dave Detlefs > Carnegie-Mellon CS -Jim Becker
ok@quintus.uucp (Richard A. O'Keefe) (11/15/88)
In article <186@tityus.UUCP> jim@athsys.uucp (Jim Becker) describes >a special option for >the compiler that would take the symbol table that it currently had >and outputing the content as a "library" file. >I have been surprised that this concept hasn't yet come into >other products that I have crossed paths with, as it seems very >elegant and nice (w/ not much effort). That's an _old_ technique! Burroughs were using it _years_ ago on their mainframe Algol, Fortran, COBOL, &c compilers. It has been described several times in the technical literature. And of course Simula 67 was doing fully type-checked separate compilation with symbol-table files before C (plain C) was dreamed of.
jim@athsys.uucp (Jim Becker) (11/17/88)
From article <681@quintus.UUCP>, by ok@quintus.uucp (Richard A. O'Keefe): > In article <186@tityus.UUCP> jim@athsys.uucp (Jim Becker) describes >>a special option for >>the compiler that would take the symbol table that it currently had >>and outputing the content as a "library" file. > >>I have been surprised that this concept hasn't yet come into >>other products that I have crossed paths with, as it seems very >>elegant and nice (w/ not much effort). > > That's an _old_ technique! Burroughs were using it _years_ ago on > their mainframe Algol, Fortran, COBOL, &c compilers. It has been > described several times in the technical literature. And of course > Simula 67 was doing fully type-checked separate compilation with > symbol-table files before C (plain C) was dreamed of. Ok -- I guess that it's been around for a long time, I haven't worked with any Burroughs equipment and hadn't seen it w/ the machines that I have worked on. If this is such a time honored technique, the next question is "Why isn't it used everywhere?". Like I said, my experience has only seen it implemented on the Amiga, but I have worked will just about all the DEC line, Data General line (thru 83), Sun and others. Is there some legal problem with people putting it into the compiler? This is a fairly simple, and I guess widely used in some circles, technique that could greatly aid this include dependency problem of C++. Wny isn't this already part of the environment? -Jim Becker
rfg@nsc.nsc.com (Ron Guilmette) (11/17/88)
In article <3561@pt.cs.cmu.edu> dld@f.gp.cs.cmu.edu (David Detlefs) writes: >There have been a number of interesting suggestions on how to go about >including header files only as needed... >... It seems to me that the "include-once-ability" of a header >file ought to be a property of the file, independent of the context... >...Every so often, someone writes an include file that is really meant to be >included multiple times... Ever heard of unix links? >...If we did want to modify cpp, though, we could >take care of both situations minimally by recording as we process the >inclusion of a file all the preprocessor symbols used in #ifdef's and >the values they had at the time of this inclusion. We would store >this list in a table, and when we next included the file, check the >current values of those symbols. If they are the same as before, >don't include the file, otherwise do. I think that there is semantic muddy-ness to the term "at the time of this inclusion". When is that exactly? At the *start* of the inclusion, somewhere in the middle, or at the end? Keep in mind that the defined values may actually change as a result of the inclusion itself. My idea was far simpler and far less expensive in compile time. I suggested (and still suggest) that cpp could be taught to remember a list of "already included" filenames. It could then ignore additional #include's for any file in this list. You could easily override this simple logic by making multiple links to individual header files which must be included multiple times. >It would be nice if you could precompile each header file to get a >symbol table (both preprocessor and language tables -- this scheme >would require an integrated preprocessor/compiler) ... I think that you may want to switch over to comp.lang.ada ;-) >I think we really need something like this to get decent compilation >performance. Yes. That is an issue isn't it. All the more reason to think about my suggestion. -- Ron Guilmette National SemiConductor, 1135 Kern Ave. M/S 7C-266; Sunnyvale, CA 94086 Internet: rfg@nsc.nsc.com or amdahl!nsc!rfg@ames.arc.nasa.gov Uucp: ...{pyramid,sun,amdahl,apple}!nsc!rfg
dld@f.gp.cs.cmu.edu (David Detlefs) (11/19/88)
To Ron Guilmette -- I couldn't seem to send you mail -- I think your idea is a very good quick-and-dirty solution -- easy to understand, easy to implement. If someone were to take the sources to some public domain cpp and implement it tomorrow (which is about all it would take!), it would get a lot of use. People *shouldn't* write include files that are intended to be included multiple times with different results on each inclusion; that is an obscure practice at best. Yet people do do this; at least they have, in include files that we may have to keep including forever in the name of backward compatibility. Your solution, with the links, works. However, I would hope that you would agree that it is a least somewhat unsatisfying. I think your solution and mine fall on a continuum; yours is very easy to implement, but requires the user to perform two actions he or she wouldn't normally have to do: use the new flag to cpp (or use the flag to turn it off when necessary, if not including multiply is made the default), and making extra links to certain files. Note that you run the risk of always using the "don't include again mode," and forgetting to turn it off when you include a file from an external library that includes a file that includes file x.h 7 times, each time with a different preprocessor environment. My solution requires somewhat more work, but in my opinion, not too much, and solves all problems, without, I think, too much cost in performance. I don't think it is "semantically muddy." To explain it again: you include a file. If this file contains no #if-like constructs, it will generate the same code every time it is included, so it will never need to be included again. If it does have an #if-like construct, then it may generate different code if it is included again at a time when the preprocessor symbols that are used in the #if have different values. (Obviously, if we #define one of these symbols before using it, it doesn't depend on the external value.) While we process the #include file for the first time, we record all the externally defined symbols uses in #if's, and their values when they were first used. If we encounter an #include of this file again, we include it only if one or more of those symbols has a different value. This solution requires no special flags, and always gets it right. Your solution has the drawbacks I mentioned, but has the virtue of being simpler to implement, and may be somewhat faster. Both, I think, are valid ideas. Trying to keep the intellectual discourse friendly... -- Dave Detlefs Any correlation between my employer's opinion Carnegie-Mellon CS and my own is statistical rather than causal, dld@cs.cmu.edu except in those cases where I have helped to form my employer's opinion. (Null disclaimer.) --
gsf@ulysses.homer.nj.att.com (Glenn Fowler[eww]) (11/19/88)
In article <3614@pt.cs.cmu.edu>, dld@f.gp.cs.cmu.edu (David Detlefs) writes: > ... > performance. I don't think it is "semantically muddy." To explain it > again: you include a file. If this file contains no #if-like > constructs, it will generate the same code every time it is included, > ... this breaks with the following simple example: x.h: extern int VARIABLE; tst.c: #include "x.h" #define VARIABLE abc #include "x.h" a proper implementation would have to keep a list of all identifiers referenced within a header to determine if a repeated include would produce different results -- the problem is compounded by nested includes -- Glenn Fowler (201)-582-2195 AT&T Bell Laboratories, Murray Hill, NJ uucp: {att,decvax,ucbvax}!ulysses!gsf internet: gsf@ulysses.att.com
campbell@redsox.UUCP (Larry Campbell) (11/21/88)
In article <3614@pt.cs.cmu.edu> dld@f.gp.cs.cmu.edu (David Detlefs) writes: } ... People *shouldn't* write }include files that are intended to be included multiple times with }different results on each inclusion; that is an obscure practice at }best. I must disagree. We regularly use the following idiom, which I think is quite defensible, for managing user and error message definitions in our products: Messages are defined in a header file that looks like this: ------------------------------message.h------------------------------ #ifdef GenerateTable #define MSG(name, string) char name[] = string; #else #define MSG(name, string) extern char name[]; #endif MSG(err_you_lose, "you lose big time, pal") MSG(err_get_real, "get real, pal") MSG(err_swap_mad, "swap read error, you lose your mind") ---------------------------end of message.h-------------------------- Modules that refer to message definitions just include message.h. One very small module then defines the messages: ------------------------------message.c------------------------------ #define GenerateTable #include "message.h" ---------------------------end of message.c-------------------------- We can then produce foreign language versions of our products more easily because all the messages are confined to one file, and we can also easily produce an appendix for the documentation that lists all possible error messages. -- Larry Campbell The Boston Software Works, Inc. campbell@bsw.com 120 Fulton Street wjh12!redsox!campbell Boston, MA 02146
tuck@alanine.cs.unc.edu (Russ Tuck) (11/21/88)
In article <3614@pt.cs.cmu.edu> dld@f.gp.cs.cmu.edu (David Detlefs) writes: >...My solution ... To explain it >again: you include a file. If this file contains no #if-like >constructs, it will generate the same code every time it is included, >so it will never need to be included again. This isn't true. An included file may generate different code if any preprocessor definitions have changed or been added. As a trivial example, consider a triv.h file containing only "int fn();", and the following fragment: #include "triv.h" #define int unsigned long #include "triv.h" Without a good reason and documentation, this is certainly highly questionable code, but it shows that *any* identifier or keyword can be redefined. You can't be sure that an include file will generate the same code twice unless the preprocessor definitions have not changed at all (and even this is probably not sufficient to make a guarantee). Let me give a motivating example for code like this. I am implementing a hierarchy of mathematical types, derived from a single base, and need to provide math and logic expressions for all these classes. Unfortunately, just providing all the operator methods for the base class is not sufficient. The base class methods return base class values, which can not be assigned to derived class objects (because the additional info contained in the derived class was lost when the operator method converted the result to the base class). So, I have to redefine all the operator methods for each derived class. To avoid many potential typographic and minor logic errors, I do this with a single include file with declarations something like: type operator+(type) Then the library user includes a single library definition .h file, which includes the above file multiple times, something like this: #define type base #include "my_math_defs.h" #define type derived #include "my_math_defs.h" (This is greatly simplified from the real code, but gives a feel for one case where a .h file can usefully generate different code without #if's.) >...While we process the #include file for the first time, we record all >the externally defined symbols uses in #if's, and their values when >they were first used. If we encounter an #include of this file again, >we include it only if one or more of those symbols has a different >value. > >This solution requires no special flags, and always gets it right. No. It's not this simple. For complete safety, it must be included again if anything about the set of preprocessor definitions has changed (or perhaps even if the #include just comes in a different code context). Russ Tuck internet: tuck@cs.unc.edu Computer Science Dept., Sitterson Hall csnet: tuck@unc University of North Carolina uucp: {ihnp4|decvax}!mcnc!unc!tuck Chapel Hill, NC 27599-3175, USA Phone: (919) 962-1755 or 962-1932
dld@f.gp.cs.cmu.edu (David Detlefs) (11/22/88)
I got a couple of interesting responses to my last post on this subject, on which I'd like to comment. First, Glenn Fowler quite correctly points out that the simple example >x.h: > extern int VARIABLE; >tst.c: > #include "x.h" > #define VARIABLE abc > #include "x.h" breaks my scheme (which was, briefly, to record what preprocessor variables an include file depends on, and their values at the point of first inclusion, and then only reinclude if those values have changed.) Glenn's point is that there is no way to determine what identifiers are preprocessor variables the first time through; any random #define may make what was a perfectly good identifier into a macro. I see no way to save my plan from this fatal flaw, and hereby abandon it publicly (insert "Taps" here...) (These kind of sick examples point out the essential corruption of cpp 1/2 :-) I abandon it to the extent that it was supposed to do everything right always while requiring no new semantics and minimal performance cost. It still may be a useful heuristic, as stated, to compete with the "insert an #include_once declaration in the header file (or in the includer)" or "tell cpp to just include once." A final note: Larry Campbell protests my classification of the practice of including files multiple times in the same program as "obscure at best," putting forth the following example: >------------------------------message.h------------------------------ >#ifdef GenerateTable >#define MSG(name, string) char name[] = string; >#else >#define MSG(name, string) extern char name[]; >#endif > >MSG(err_you_lose, "you lose big time, pal") >MSG(err_get_real, "get real, pal") >MSG(err_swap_mad, "swap read error, you lose your mind") >---------------------------end of message.h-------------------------- I don't think this is bad usage; I probably didn't make it clear that the practice I thought was bad was the inclusion of the same file multiple times in the same *compilation unit* with different intended results. I assume that you would only need to include message.h once per .c file, with GenerateTable either undefined in all but one of the .c files. Fine, no problem; what I would have object to would be doing this using something like ------------------------------message1.h----------------------------- #ifdef GenerateTable #define MSG(name) char name[] = MSG_STRING; #else #define MSG(name) extern char name[]; #endif ---------------------------end of message1.h-------------------------- ------------------------------message.h------------------------------ #define MSG_STRING "you lose big time, pal" #include <message1.h> MSG(err_you_lose) #define MSG_STRING "get real, pal" #include <message1.h> MSG(err_get_real) #define MSG_STRING "swap read error, you lose your mind" #include <message1.h> MSG(err_swap_mad) ---------------------------end of message.h-------------------------- I trust you will agree that this is pretty obscure (at best)! -- Dave Detlefs Any correlation between my employer's opinion Carnegie-Mellon CS and my own is statistical rather than causal, dld@cs.cmu.edu except in those cases where I have helped to form my employer's opinion. (Null disclaimer.) --
prh@actnyc.UUCP (Paul R. Haas) (11/22/88)
In article <10873@ulysses.homer.nj.att.com> gsf@ulysses.homer.nj.att.com (Glenn Fowler[eww]) writes: >In article <3614@pt.cs.cmu.edu>, dld@f.gp.cs.cmu.edu (David Detlefs) writes: >> ... >> [David discusses his scheme for avoiding including files to often.] >> ... > > [Glenn shows a flaw with David's method.] I think I have a solution which is ANSI compatible and requires minimal source code changes (some people write their header files this way already). Make the C preprocessor recognize files of the form: comments and whitespace #ifndef SOME_SYMBOL any properly nested stuff, hopefully including "#define SOME_SYMBOL" #end comments and whitespace And only include them a second time if the symbol is not defined. The preprocessor would have to keep a table of include file names and symbols that they depend on. Obviously, this should be turned off if the preprocessor is sending comments through. Advantages: o Code will behave the same way with preprocessors without this feature. o No #pragmas required o Easy to explain to people Disadvantages: o In order for this to speed things up, most header files should be written in a particular form. o I am not sure how much effort this will take. It looks like I will end up putting some ugly hacks into cccp.c. o Will slightly slow down the preprocessor for header files which lack the #ifndef ... #endif structure. Feel free to point out something that I missed. I am going to look into adding this to the Gnu preprocessor, cccp. If anyone else is doing something similar, let me know. ----- Paul Haas uunet!actnyc!prh or prh@frith.egr.msu.edu (212) 696-3653
rfg@nsc.nsc.com (Ron Guilmette) (11/24/88)
In article <3614@pt.cs.cmu.edu> dld@f.gp.cs.cmu.edu (David Detlefs) writes: >To Ron Guilmette -- I couldn't seem to send you mail -- Several people have told me this. I don't know hat's happening but I intend to find out! >I think your idea is a very good quick-and-dirty solution -- easy to >understand, easy to implement. If someone were to take the sources to >some public domain cpp and implement it tomorrow (which is about all it >would take!), it would get a lot of use. I agree that what I proposed is "quick" I disagree that it is "dirty". I'll see if I can find the time to make the necessary mods to GNU cpp soon. If I do get it done, I will post the necessary patches here and in the GNU C and GNU C++ newsgroups. The diffs ought to be reasonably small. >People *shouldn't* write >include files that are intended to be included multiple times with >different results on each inclusion; that is an obscure practice at >best. Yet people do do this; at least they have, in include files that >we may have to keep including forever in the name of backward >compatibility. I generally agree that this seems to be a questionable practice, but I'd rather be a bit less judgemental about it. If people have found what they think are good reasons for doing this, then, to quote the Beatles, "Let it be". >Your solution, with the links, works. However, I would >hope that you would agree that it is a least somewhat unsatisfying. No I don't agree. It tastes great! No. It's less filling! Tastes great! Less filling! ... Well, OK. It is a little kludgy, but then all of cpp is just a big kludge which people keep on using year after year because it does some things very well which cannot yet be done in any other ways. >I think your solution and mine fall on a continuum... That sounds painful. I would rather fall on my sword that on a continuum! >... yours is very easy >to implement, but requires the user to perform two actions he or she >wouldn't normally have to do: use the new flag to cpp (or use the flag >to turn it off when necessary, if not including multiply is made the >default), and making extra links to certain files. Note that you run >the risk of always using the "don't include again mode," and forgetting >to turn it off... OK. The problems you point out are very valid. Looks like it is time for me to switch to plan B. In retrospect, I admit that the notion of an extra flag to cpp (i.e. the don't do multiple includes flag) was a dumb idea. As you have noted, it lacks true backward compatibility in the purest sense (i.e. not even having to edit your Makefiles). The multiple links to get multiple inclusions of the same single file was also a dumb idea. Plan B: So let's fix cpp to do the following. First, under "normal" circumstances, the new cpp will do just what it has traditionally done, i.e. include EACH and EVERY file called for in EACH and EVERY #include directive. Now for the trick. Pick one of the UNIX protection mode bits which is typically NEVER USED and totally INSIGNIFICANT (as least for normal source code files). Just for the sake of argument, lets pick the "set-gid" bit (i.e. 02000). Now let's say that we modify cpp so that it will *NOT* re-include any header file which has its set-gid bit set. Ta da! Presto! No more tricky links, full backwards compatibility, and best of all, the ability to control re-inclusions on a file-by-file basis in a very easy and simple manner which does *not* require any changes to Makefiles or any other files. The above scheme is simple to implement, simple to use, and simple to understand. The semantics are crystal clear and it can be slowly worked into existing code (both C and C++) on an as-needed basis. Although it would seem that this scheme is heavily dependent on UNIX-specific file modes, I believe that most operating systems have at least some ways of tagging individual files with one more meaningful bit of information. Specifically, MS-DOS also has a per-file mode, and I believe that VMS does also. >My solution requires somewhat more work... >...I don't think it is "semantically muddy."... Is my suggested approach "muddy"? I don't see how? >again: you include a file. If this file contains no #if-like >constructs, it will generate the same ***code*** every time it is included, >so it will never need to be included again. It is *NOT* true that, if there are no #if's in a given header file, then you will get the same "code" (let alone the same "effect") from each inclusion of that file. Note the following: /* something.h */ int my_array[FOOBAR]; There are no #if's in this file, but the code can still be different on different inclusions if the value of FOOBAR changes. More significant is that the "effect" of a header file (without any #if's) can be different for different inclusions, i.e.: #define NEW_FOOBAR FOO##BAR >[My] solution requires no special [cpp] flags, and always gets it right. >Your solution has the drawbacks I mentioned, but has the virtue of >being simpler to implement, and may be somewhat faster. As described above, I have figured out how to easily avoid extra cpp flags. As you note, my approach is simple and fast. I rest my case.
ok@quintus.uucp (Richard A. O'Keefe) (11/24/88)
Instead of tinkering with the definition of an existing construct,
why not do the sanitary thing and add a new one?
#use {filename}
resolve a "filename" or <filename> as #include would, and if that
resolved file name has already been #included or #used, do nothing
otherwise #include it. Easy to do, can't break old code, doesn't
require changes to the file being used. In fact, if you record
the usage of the filename when the #include or #use is _started_,
you can even have files which #use each other (this can be useful).greg@cantuar.UUCP (G. Ewing) (11/24/88)
Instead of including the same header file several times, what's
wrong with having the header file define a macro, including it
once, and then using the macro several times?
Using Larry Campbell (campbell@redsox.UUCP)'s example (sort of):
------------------------------message.h------------------------------
#define GENERATE_MESSAGES \
MSG(err_you_lose, "you lose big time, pal") \
MSG(err_get_real, "get real, pal") \
MSG(err_swap_mad, "swap read error, you lose your mind") \
#undef MSG
---------------------------end of message.h--------------------------
------------------------------message.c------------------------------
#include "message.h"
#define MSG(name, string) char name[] = string;
GENERATE_MESSAGES
#define MSG(name, string) extern char name[];
GENERATE_MESSAGES
---------------------------end of message.c--------------------------
Hmm... that perhaps wasn't the most useful of examples, but you get
what I mean, I hope. The point is that cpp could safely be hacked
to only ever include things once; anything that can be done with
multiply included files can also be done using macros.
Can anyone think of something that couldn't?
(P.S. I'd allow a backward-compatibility flag, before anyone jumps
on me...)
Greg Ewing Internet: greg@cantuar.uucp
Spearnet: greg@nz.ac.cantuar Telecom: +64 3 667 001 x8357
UUCP: ...!{watmath,munnari,mcvax,vuwcomp}!cantuar!greg
Post: Computer Science Dept, Univ. of Canterbury, Christchurch, New Zealand
Disclaimer: The presence of this disclaimer in no way implies any disclaimer.henry@utzoo.uucp (Henry Spencer) (11/26/88)
In article <738@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >Instead of tinkering with the definition of an existing construct, >why not do the sanitary thing and add a new one? > #use {filename} >... Easy to do, can't break old code, doesn't >require changes to the file being used. Well, leaving the existing construct alone (except perhaps in the presence of a compiler flag) is the right thing to do, but "doesn't require changes to the file being used" is a bug, not a feature. In most cases, the fact that #including a file n times is the same as including it once is a property of the file being included, not the file doing the including. The probability of trouble and mistakes will be much lower if it's the file being included that determines whether future inclusions have any effect. The Waterloo "#pragma idempotent" strikes me as the right method; among other things, it means that your code is portable. -- Sendmail is a bug, | Henry Spencer at U of Toronto Zoology not a feature. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
ok@quintus.uucp (Richard A. O'Keefe) (11/26/88)
In article <867@cantuar.UUCP> greg@cantuar.UUCP (G. Ewing) writes: >Instead of including the same header file several times, what's >wrong with having the header file define a macro, including it >once, and then using the macro several times? It requires that the macro processor be able to handle very large macros, which might not otherwise be required. Also, it is not possible for the two approaches to be equivalent (consider the built-in macros __FILE__ and __LINE__). >The point is that cpp could safely be hacked >to only ever include things once; anything that can be done with >multiply included files can also be done using macros. But not using the existing sources. If you don't like #include, use something else. It is bad manners to break other people's tools, without even asking them. As an example of a company with good manners, the High C compiler has a pragma which goes something like this: pragma Include(<file>); /* or Include("file"); */ pragma C_Include(<file>); /* or C_Include("file"); */ where C_Include includes the file only if it hasn't already been included. (This is "pragma", by the way, not "#pragma".) Note that at least in UNIX systems, it is possible to #include "/dev/tty" {This can actually be useful, if you know what you're doing.} There are other kinds of files where the contents may be different, _by design_, when you open them again.
ok@quintus.uucp (Richard A. O'Keefe) (11/27/88)
In article <1988Nov25.180309.9323@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <738@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >> #use {filename} >>... Easy to do, can't break old code, doesn't >>require changes to the file being used. > >Well, leaving the existing construct alone (except perhaps in the presence >of a compiler flag) is the right thing to do, but "doesn't require changes >to the file being used" is a bug, not a feature. In most cases, the fact >that #including a file n times is the same as including it once is a >property of the file being included, not the file doing the including. Which is why "doesn't require changes to the file being used" is NOT a design error. If it is the case that including the file N times is ok, then the file doesn't NEED _any_ changes. If it is not the case that including the file N times is ok, then a construct like this should not be used. What I had in mind is the situation where someone *else* owns the header file in question, so that you _can't_ change it. Ok, you can get around that by interposing your own file which says #pragma include_once; #include "the-real-file" but you can just as easily do that by putting #ifndef FOOBAZ_H #include "real_foobaz.h" #define FOOBAZ_H 1 #endif in your "foobaz.h" file. Actually, now that I come to think of it, just how hard _is_ it to write a 4-line wrapper in order to make anmult idempotent version of a header? I'm afraid that it is not really the case that idempotence is "a property of the file being included, not the file doing the including.", only if the file being included contains no identifiers which could be #defined by the caller. Suppose we have extern int strlen(char*); in a file strlen.h. Now consider #define int double #define strlen atof #include "strlen.h" #undef int #undef strlen #include "strlen.h" Not a terribly reasonable thing to do, but sufficient to show that the idempotence of even such an innocent header as that IS dependent on the includer, so that it *is* advisable for the includer to explicitly promise not to break it. "In most cases", true, this doesn't happen, but that is a property of the includer!
rfg@nsc.nsc.com (Ron Guilmette) (11/27/88)
In article <1988Nov25.180309.9323@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In most cases, the fact >that #including a file n times is the same as including it once is a >property of the file being included, not the file doing the including. Absolutely right Henry. I realized this very fact just before I suggested using the sid-uid bit on header files which are not to be included more than once. In hindsight though, I now realize that even that was a silly suggestion. I now think that putting something like: #once or #pragma once into such header files would be more appropriate, more clear, and a more portable solution. >The probability of trouble and mistakes will be much lower if it's the >file being included that determines whether future inclusions have any >effect. Exactly right. >The Waterloo "#pragma idempotent" strikes me as the right method; >among other things, it means that your code is portable. Is this (semantically) the same thing as I am suggesting? I have never heard of this before! If it has the effect of forcing the file it is found within to only be included once then I guess that there is at least one clear (although not yet standard) precedent for an approach to this problem.
campbell@redsox.UUCP (Larry Campbell) (11/28/88)
In article <8080@nsc.nsc.com> rfg@nsc.nsc.com.UUCP (Ron Guilmette) writes: }In article <1988Nov25.180309.9323@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: }In hindsight though, I now realize that even that was a silly suggestion. }I now think that putting something like: } } #once }or } #pragma once } }into such header files would be more appropriate, more clear, and a more }portable solution. ... }>The Waterloo "#pragma idempotent" strikes me as the right method; }>among other things, it means that your code is portable. How portable can something be that's not even implemented yet??? If you just wrap your header files like this: #ifdef foo_included ... #define foo_included #endif it achieves the result you desire, completely portably, today, in both C++ and C, without requiring changes to preprocessors or funky #pragma statements (Dr. Coggins' esthetic distaste for this much pragmatism notwithstanding). Really, I think this is a SOLVED PROBLEM that's been getting beaten to death. Could we move on to more interesting aspects of the import/export question than how to get it done with the current syntax-oriented paradigm? -- Larry Campbell The Boston Software Works, Inc. campbell@bsw.com 120 Fulton Street wjh12!redsox!campbell Boston, MA 02146
akwright@watdragon.waterloo.edu (Andrew K. Wright) (11/29/88)
In article <8080@nsc.nsc.com> rfg@nsc.nsc.com.UUCP (Ron Guilmette) writes: >In article <1988Nov25.180309.9323@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >>In most cases, the fact >>that #including a file n times is the same as including it once is a >>property of the file being included, not the file doing the including. >> >>The probability of trouble and mistakes will be much lower if it's the >>file being included that determines whether future inclusions have any >>effect. > >Exactly right. > >>The Waterloo "#pragma idempotent" strikes me as the right method; >>among other things, it means that your code is portable. > >Is this (semantically) the same thing as I am suggesting? I have never >heard of this before! If it has the effect of forcing the file it is >found within to only be included once then I guess that there is at least >one clear (although not yet standard) precedent for an approach to this >problem. I will expand a bit on my original posting about #pragma idempotent (or #pragma once as Ron Guilmette suggests). When a file marked with #pragma idempotent is included the first time, the preprocessor makes a note of this. If the file is later included again, the include directive is simply ignored. Determining when the file is included "again" means determining when two include paths are equal. You might want to go by physical file equality (ie. compare devices and i-nodes), pathname equality, or string equality. The Waterloo compiler uses simple string equality: two include paths are considered equal if strcmp() on the argument strings of the include directives, modulo their quotes or <>s, reports they are equal. We think it is a bad idea to build operating system dependant features such as links or i-nodes into the semantics of the language; C runs on many non-UNIX systems without such features. Besides, you will never get the ANSI committee to agree to such. Another poster noted that idempotency is not entirely an attribute of the included file. This is true; both the includer and the includee must agree that the file is to have idempotent semantics, or someone will get surprised. Thus #pragma idempotent is more than just a statement that redundant #includes of this file will be ignored; it asserts that the user is NOT ALLOWED to redefine any names appearing in such a file. This matches the ANSI committee's intent for the standard include files: the user is not permitted to redefine anything in <stdio.h> for instance. (He is permitted to #undef macros to get real functions). <stdio.h> can therefore be marked with #pragma idempotent, as can all the other standard include files. In fact, It just occurs to me that CPP could mark identifiers brought in by idempotent include files as non-redefinable. This has the obvious advantage that the ANSI rule "<the several hundred> standard identifiers cannot be redefined" becomes a compiler enforced rule, not just a vapor rule. (Personally I hate the idea of several hundred reserved words, but the committee seems to be well on its way down that path.) Lastly, #pragma idempotent is upwards compatible. It does not break *any* (not just few) existing C programs. It does have the disadvantage that pragmas are not required (by ANSI) to be supported by all compilers, so you may still have to wrap #ifndefs around your include file bodies if you expect to have your code ported to any C compiler. Andrew K. Wright akwright@watmath.waterloo.edu CS Dept., University of Waterloo, Ont., Canada.
henry@utzoo.uucp (Henry Spencer) (11/30/88)
In article <562@redsox.UUCP> campbell@redsox.UUCP (Larry Campbell) writes: >How portable can something be that's not even implemented yet??? If you >just wrap your header files [in #ifndef]... >it achieves the result you desire, completely portably, today, in both >C++ and C, without requiring changes to preprocessors or funky #pragma >statements... Really, I think this is a SOLVED PROBLEM ... Not so. The problem is that a header wrapped in #ifndef still needs to be opened, read, and scanned completely every time, which is costly when the #include relationships are complex and common header files get picked up many, many times. One can imagine a tricky compiler which notices the wrapping and optimizes this case, but that's not going to be easy. In practice, a header file using "#pragma idempotent" or whatever would still include the #ifndef wrapper, for the sake of portability. (There is no requirement that #pragma be portable, but there *is* a requirement that ANSI C implementations ignore unrecognized #pragmas.) -- SunOSish, adj: requiring | Henry Spencer at U of Toronto Zoology 32-bit bug numbers. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
darin@Apple.COM (Darin Adler) (12/07/88)
In article <1988Nov29.203751.26424@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: > In article <562@redsox.UUCP> campbell@redsox.UUCP (Larry Campbell) writes: > >How portable can something be that's not even implemented yet??? If you > >just wrap your header files [in #ifndef]... > >it achieves the result you desire, completely portably, today, in both > >C++ and C, without requiring changes to preprocessors or funky #pragma > >statements... Really, I think this is a SOLVED PROBLEM ... > > Not so. The problem is that a header wrapped in #ifndef still needs to > be opened, read, and scanned completely every time, which is costly when > the #include relationships are complex and common header files get picked > up many, many times. One can imagine a tricky compiler which notices > the wrapping and optimizes this case, but that's not going to be easy. I have to disagree here, Henry. Writing a compiler (or preprocessor) that notices the wrapping and optimizes it is extremely easy. While the preprocessor is scanning a file it simply sets a "start of file" flag which is cleared when anything besides a comment is seen. If an #ifndef occurs with this flag set, the preprocessor symbol can be recorded a table, along with the file name. (Of course, another check to be sure that the #endif is the last non-comment line in the file as well would be necessary, but that is left as a simple exercise for the implementor.) Then, any time the same file is included, that symbol can be checked...if it is defined, the file need not be opened. Note that even if the system can't recognize "this.h" and <this.h> and "/this.h" as the same file, there is no problem, since the file protects itself from multiple inclusion. > In practice, a header file using "#pragma idempotent" or whatever would > still include the #ifndef wrapper, for the sake of portability. (There > is no requirement that #pragma be portable, but there *is* a requirement > that ANSI C implementations ignore unrecognized #pragmas.) Note that the above scheme is superior because it gets the same results as the pragma without typing it, since you still plan to include the #ifndef wrapper. I'm ready to recommend to our compiler folks that they implement the scheme now! -- Darin Adler AppleLink: Adler4 UUCP: {amdcad,decwrl,hoptoad,nsc,sun}!apple!darin Internet: darin@Apple.com
henry@utzoo.uucp (Henry Spencer) (12/09/88)
In article <21780@apple.Apple.COM> darin@Apple.COM (Darin Adler) writes: >... Writing a compiler (or preprocessor) that >notices the wrapping and optimizes it is extremely easy... How extremely easy or significantly difficult it is depends on how your compiler is implemented. However, I concede that I was overly pessimistic about this, and it's probably the right approach. -- SunOSish, adj: requiring | Henry Spencer at U of Toronto Zoology 32-bit bug numbers. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu