bobmon@iuvax.cs.indiana.edu (RAMontante) (03/26/89)
Hey, gurus and standards freekz... A question is going around comp.sys.ibm.pc just now, to wit: somebody has noticed that (TurboC's) linker links all the routines in a file into his object file, whether a particular routine is actually called or not. This bothers him, because the resulting executable is larger than it really needs to be. Among the flurry of LIBrarian tricks to defeat this, a couple of people are saying that this isn't (TurobC/MSC/whoever)'s fault, because C requires that all routines (all symbols, maybe?) in a file be linked in if any of them are. Are there any nuggets of correctness in all this? Would someone care to shine some light on it all? (disclaimer: I may have oversimplified the discussion. If you have a good answer, it's probably worth crossposting to comp.sys.ibm.pc.)
chris@mimsy.UUCP (Chris Torek) (03/26/89)
In article <18925@iuvax.cs.indiana.edu> bobmon@iuvax.cs.indiana.edu (RAMontante) writes: >... a couple of people are saying that this isn't (TurobC/MSC/whoever)'s >fault, because C requires that all routines (all symbols, maybe?) in a >file be linked in if any of them are. Given that the pANS does not have the concept of a `library', or even of `separate compilation', this is clearly false. It is, however, difficult to tell which of several code and/or data sections may be required. Consider, for instance, the following: static void a(), b(); static void (*table)[2] = { a, b }; entry_point(int n) { go(&table[0], n); } static void go(void (**tab)(), int n) { (*tab[n])(); /* this calls either a() or b() */ } static void a() { (void) printf("a called\n"); } static void b() { (void) printf("b called\n"); } It is not possible to tell, at compile time, which of `a' and `b' will be called. If `n' is deleted from entry_point(), and we call `go' with 0, b() can be elided. Discovering this is quite difficult. More generally, if the link format uses offsets to locations that can be resolved at compile time (such as from entry_point() to go(), if the machine supports pc-relative calls), there may be insufficient information in the object files. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
jacobs%cmos.utah.edu@wasatch.UUCP (Steven R. Jacobs) (03/26/89)
In article <16541@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >In article <18925@iuvax.cs.indiana.edu> bobmon@iuvax.cs.indiana.edu >(RAMontante) writes: >>... a couple of people are saying that this isn't (TurobC/MSC/whoever)'s >>fault, because C requires that all routines (all symbols, maybe?) in a >>file be linked in if any of them are. > >Given that the pANS does not have the concept of a `library', or >even of `separate compilation', this is clearly false. It is, however, >difficult to tell which of several code and/or data sections may >be required. Consider, for instance, the following: > > static void a(), b(); > static void (*table)[2] = { a, b }; > > entry_point(int n) { go(&table[0], n); } > > static void go(void (**tab)(), int n) { > (*tab[n])(); /* this calls either a() or b() */ > } > > static void a() { (void) printf("a called\n"); } > static void b() { (void) printf("b called\n"); } > >It is not possible to tell, at compile time, which of `a' and `b' will >be called. If `n' is deleted from entry_point(), and we call `go' with >0, b() can be elided. Discovering this is quite difficult. Yes, but suppose the following (common) situation occurs: extern void a(), b(), c(), d(); /* NOTE no longer static */ static void (*table)[2] = { a, b }; /* c() and d() not used here */ /* extra lines omitted */ and in a different file: void a() { (void) printf("a called\n"); } void b() { (void) printf("b called\n"); } void c() { (void) printf("c called\n"); } void c() { (void) printf("d called\n"); } This situation must not be too hard to detect, since lint will give "function defined but not used" messages in this case. Admittedly, this might not be an appropriate thing for the linker to handle, but it would sure be nice if the librarian would detect such cases and treat them as if they were compiled from separate files, at least when no variables of "file-only" scope are involved. Static functions that are not used could be completely eliminated, and library functions that are similar could be conveniently grouped into a single source file. I find it easier to manage libraries of 20,000 lines of code when they are in a few dozen files of a few hundred lines each as opposed to hundreds of files, many of which contain similar functions that are only 5 to 10 lines of code. The limitations of present linkers/librarians force my programs to be larger than they need to be, or force me to deal with hundreds of source files. Steve Jacobs ({ihnp4,decvax}!utah-cs!jacobs, jacobs@cs.utah.edu)
chris@mimsy.UUCP (Chris Torek) (03/27/89)
In article <16541@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >Given that the pANS does not have the concept of a `library', or >even of `separate compilation', ... I should probably rephrase that. It does have something called `external linkage'; it just does not tie it specifically to `separate compilation' and `libraries'. (The difference is that between what must be and what usually is.) I should also restate my point, which is this: You cannot tell which functions are needed---consider the program main() { while (the_machine_continues_to_exist()) /* void */; /* never gets here */ library_function_f(); exit(0); } ---so the best you can do is an approximation (`the function strftime will never be called; the function strcpy might be called; ...'). Unfortunately, unless the link file format has been carefully defined and the compiler cooperates, you cannot even do that: _foo: .globl _foo .word 0 movl $_foo+foosize,r0 calls $0,(r0) ret .align 2 0: .set foosize,0b-_foo The VAX-assembly-code function foo() calls whichever function is linked immediately following it, so eliding that function because it appears unused changes the execution. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
gwyn@smoke.BRL.MIL (Doug Gwyn ) (03/27/89)
In article <16541@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >Given that the pANS does not have the concept of a `library', or >even of `separate compilation', ... The pANS does recognize the notion of library and separate compilation; see Section 2.1.1.1. According to the pANS, a program consists of a set of translation units linked together and communicating by well-defined "external" interfaces. Nowhere in the pANS (that I could find) is there any idea that only a portion of a translation unit might be linked into a program. The means by which available translation units are selected for linking together into programs is not within the scope of the standard, although the usual link-editing of multiple object modules with others selected from libraries to satisfy external references is clearly among the methods envisioned. I would say that any link process which dropped a portion of an object module (presumed to be produced from a single translation unit) would be non-standard conforming (unless the dropped portion had no detectable effect on the final program).
jfc@athena.mit.edu (John F Carr) (03/27/89)
In article <16546@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >>Given that the pANS does not have the concept of a `library', or >>even of `separate compilation', ... >The VAX-assembly-code function foo() [deleted] calls whichever function is >linked immediately following it, so eliding that function because it >appears unused changes the execution. Does the standard have anything to say about linking to programs written in other languages, or even compiled by different compilers? Would a compiler & environment (i.e. linker) that loaded by C function instead of file (and therefore broke the deleted example) be conforming? Assume that this hypothetical compiler works correctly on all C programs. A more important problem is this: there are at least two strategies for passing structures to and from functions. One is to pass the structure on the stack, the other is for the caller to pass a pointer. Modules compiled using different methods will not work together. Does the standard offer any guidance in this case? As long as not all compilers are bug-free, there will be reasons to use different compilers on parts of the same program. (My guess at the answer to the above questions: "the standard can not attempt to define behavior when different compilers are used for different source files, or when interacting with languages other than standard C.") -- John Carr "When they turn the pages of history, jfc@Athena.mit.edu When these days have passed long ago, bloom-beacon! Will they read of us with sadness athena.mit.edu!jfc For the seeds that we let grow?" --Neil Peart
henry@utzoo.uucp (Henry Spencer) (03/28/89)
In article <10126@bloom-beacon.MIT.EDU> jfc@athena.mit.edu (John F Carr) writes: >(My guess at the answer to the above questions: "the standard can not attempt >to define behavior when different compilers are used for different source >files, or when interacting with languages other than standard C.") Right. It is not guaranteed that it will even be possible. (Plausible example: an interpretive implementation might not support such things at all.) These are "quality of implementation" issues. -- Welcome to Mars! Your | Henry Spencer at U of Toronto Zoology passport and visa, comrade? | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
Tim_CDC_Roberts@cup.portal.com (03/28/89)
I think that this discussion of external linkages has missed the point of the original poster, although I could be missing the point too. If I read the original correctly, he is saying that given: MY.LIB <== A.OBJ entry points A1 A2 A3 externals C1 B.OBJ entry points B1 B2 B3 C.OBJ entry points C1 C2 C3 and given main () { a1(); a3(); } then the Turbo C linker will include ALL 3 modules in the resulting executable, whereas it is plain that B.OBJ is not required. If this is, in fact, the case, then the Turbo C linker is _broken_. The Microsoft Linker will include only A.OBJ and C.OBJ in the executable. Now, if A, B, and C are all modules on a single OBJ, and that OBJ is fed to the linker, then one would expect all three to appear on the executable. I don't think that was the question, however. Tim_CDC_Roberts@cup.portal.com | Control Data... ...!sun!portal!cup.portal.com!tim_cdc_roberts | ...or it will control you.
bobmon@iuvax.cs.indiana.edu (RAMontante) (03/28/89)
Tim_CDC_Roberts@cup.portal.com <16315@cup.portal.com> : [ condensed ] - -If I read the original correctly, he is saying that given: - - MY.LIB <== A.OBJ entry points A1 A2 A3 externals C1 - B.OBJ entry points B1 B2 B3 - C.OBJ entry points C1 C2 C3 - - main () { a1(); a3(); } - - [ ... ] - -Now, if A, B, and C are all modules on a single OBJ, and that OBJ is fed -to the linker, then one would expect all three to appear on the executable. I didn't mean all entries in a LIBrary, I meant all modules in the same original OBJ. In fact the simplest "fix" is to make a library out of it. The question was: is such behavior (linking everything in the OBJ) necessary for some reason, or is it more likely to be a hack for speed/simplicity of compilation (or a bug)? Sorry I wasn't clear the first time.
Devin_E_Ben-Hur@cup.portal.com (03/29/89)
> I think that this discussion of external linkages has missed the point of > the original poster, although I could be missing the point too. > > If I read the original correctly, he is saying that given: > > MY.LIB <== A.OBJ entry points A1 A2 A3 externals C1 > B.OBJ entry points B1 B2 B3 > C.OBJ entry points C1 C2 C3 > Nope, the original poster made no mention of libraries. He wished the linker to treat: LINK A+B+C,P.EXE; as if all the functions in A,B,&C were independantly compiled then made into a library and linked only if referenced. > and given > > main () { a1(); a3(); } > > then the Turbo C linker will include ALL 3 modules in the resulting > executable, whereas it is plain that B.OBJ is not required. If this > is, in fact, the case, then the Turbo C linker is _broken_. The Microsoft > Linker will include only A.OBJ and C.OBJ in the executable. > The turbo linker will perform just link the uSoft linker for this. Even if it did include B.OBJ, it would not be _broken_ merely an inferior implementation. A broken linker produces an incorect program, including b.obj does not make the output incorrect, merely larger than neccessary. > Now, if A, B, and C are all modules on a single OBJ, and that OBJ is fed > to the linker, then one would expect all three to appear on the executable. > I don't think that was the question, however. > > Tim_CDC_Roberts@cup.portal.com | Control Data... > ...!sun!portal!cup.portal.com!tim_cdc_roberts | ...or it will control you. Devin_Ben-Hur@Cup.Portal.Com ...ucbvax!sun!portal!cup.portal.com!devin_ben-hur
bright@Data-IO.COM (Walter Bright) (03/30/89)
In article <18980@iuvax.cs.indiana.edu> bobmon@iuvax.cs.indiana.edu (RAMontante) writes: >The question was: is such behavior (linking everything in the OBJ) >necessary for some reason, or is it more likely to be a hack for >speed/simplicity of compilation (or a bug)? It is expected behavior, that if you specify a .OBJ file to the linker, it'll link it in. Suppose, for example, you create a C file that has only static char copyright[] = "Copyright (C) by XYZ Corp"; in it, and you want that string imbedded in the resulting EXE file. This file would be compiled and then placed in the list of OBJs to be linked together. If the linker ignored it, because it didn't satisfy any unresolved externals, then that is a BUG. The order that OBJs are specified to the linker is also important. The purpose of library files is to link in only the object files necessary to resolve any remaining undefined externals. On a related issue, the structure of an OBJ file closely follows that of an assembly language source file, i.e. it is *not* organized as a sequence of functions. OBJ files are a sequence of bytes, with public and external symbols. What the function boundaries are, or even if the bytes represent code or data, is irrelevant to the format of the OBJ file. Expecting object files to have more structure to them is nice for the future, but for now and for compatibility with existing practice, it's impractical. This lack of structure in the OBJ file is a major obstacle when creating a symbolic debugger. So everyone who does a symbolic debugger has invented extensions to the format in order to add structure. Unfortunately, the problems are: 1. This is only added if symbolic debug info is requested. 2. It adds quite a bit to the size of the file, slowing down linking. 3. Microsoft and Borland have decided to keep their formats secret, thus doing a major disservice to the community. (Let's here it for open standards!) P.S. My comments apply to OBJ files on MS-DOS, I'm not familiar with COFF.
dg@lakart.UUCP (David Goodenough) (04/14/89)
Guts of argument: file1: proc1() { proc3(); } file2: proc2() { } file3: proc3() { } N.B. file2 is not necessary to resolve inclusion of file1. The question: Should inclusion of file2 on the command line cause inclusion of the code for proc2, even though it is not needed to resolve any undefined labels? The answer: (IMHO) Yes. There is a difference between object files (UNIX .o) and Libraries (UNIX .a) ALL stuff in a .o should be included because it may be needed to resolve a forward reference. When I write programs, I can produce just 14K of object from 20 source files (OK I'm using Z80 assembler, but the principle still holds true in any environment), and I have external references all over hell's half acre. Now I don't want any damn linker trying to second guess what I mean. As far as I know, the L80 linker ( father of the MS-DOS mess????? ) had a /S option to search: So if I said: L80 FILE1,FILE2,FILE3/S ..... FILE1.REL and FILE2.REL would be linked in their entirity, needed or not, but FILE3 would be searched, and only used to resolve current undefined labels. The linker I use now (ZLINK) does it, but automatically, based on the filename extension: .O for mandatory linkage, and .L for libraries. Also the internal format of a .L file is a little different, but that's another story. -- dg@lakart.UUCP - David Goodenough +---+ IHS | +-+-+ ....... !harvard!xait!lakart!dg +-+-+ | AKA: dg%lakart.uucp@xait.xerox.com +---+