LEICHTER-JERRY@YALE.ARPA.UUCP (06/18/86)
Reply-To: <LEICHTER-JERRY@YALE.ARPA> >The use of this "first function will be transfer address" feature should >probably be avoided - it's non-portable (though I'd guess it's in VAX C >exactly because other implementations did this - Unix is one, in fact - >and programs came to rely on it. > -- Jerry Don't badmouth UNIX so quickly. Here's what happens in SysV: $ cat foo.c foo() { printf("hello, world!\n"); } $ cc -o foo foo.c undefined first referenced symbol in file main /lib/crt0.o ld fatal: Symbol referencing errors. No output written to foo The interpretation of this is that the UNIX linker was looking for a main() to satisfy the reference by the startup routines contained in the object module /lib/crt0.o (there is a _start() routine in /lib/crt0.o which the linker treats as a hardwired "transfer address" and which calls main()). The linker did not find main() so it barfed. Simple, no? When I discovered that VAX C made the first routine the entry point by default, I figured there was SOME reason for going to the trouble, so I pulled out my 4.2bsd documentation and checked out ld. It says this: [In the command description] The entry point of the output is the beginning of the first routine (unless the -e option is specified). [The description of -e] The following argument is taken to be the name of the entry point of the loaded program; location 0 is the default. Note that these two statements are not obviously consistent, and, in fact, had better disagree on any system without something like separate I&D spaces (else the "first routine" would have to be at 0, hence its address would be at 0 - but that's NULL!) Nor do they account for main(). I see no reference at all to _start(). Since you mentioned System V, I checked some AT&T 3B2 ld documentation I have here. The only references to the entry point are as follows: [The description of -e epsym]: Set the default entry point address for the output file to be that of symbol epsym. [Under Caveats] When the link editor is called through cc(1), a startup routine is linked with the user's program. [The startup routine arranges to call exit(), etc.; no mention of entry points.] Again, no reference to _start() - or to what the entry point would be if -e were left out. The documentation of the cc command doesn't say either. But note that your "simple" example, and hardwired entry point, are apparently NOT ld's doing, but cc's! At this point, I got curious to see what the reality of the situation was. So I tried your little foo program out on our local Celerity (4.2bsd). "cc foo" produces "Undefined: _main", and running the resulting a.out produces an immediate "Invalid address". However, a foo.o gets left around. So I did an "ld foo.o". This led to "Undefined: _printf". Well, getting there. I tried "ld foo.o /lib/libc.a". No errors! Running a.out produces "Hello world", followed by an access violation. Adding an explict exit(0) fixes that nicely. In fact, the resulting program even receives its command line arguments properly! (Well, it gets argc; I didn't bother to check argv, but it's pretty certain to be correct - both are coming from the Shell's exec.) My 3B2 is down at the moment so I have no System V implementation to try it on, but are you still going to bet that the first routine WON'T end up as the entry point? The thing that's so wonderful about Unix is its portability. And consisten- cy. And documentation. And, of course, the legions of Unix users who can be counted on to view ANY non-laudatory mention of Unix as "badmouthing" it. How is the comment - right or wrong - that Unix makes the default entry point the first routine "badmouthing" Unix? At worst, I was claiming that a lot of non-portable C code got written under Unix (since K&R certainly contains nothing to indicate that there can be an entry point other than main()). And if you don't believe THAT, then you haven't looked at much Unix code. -- Jerry -------
jso@edison.UUCP (John Owens) (06/23/86)
-- So I pulled out my 4.2bsd documentation and checked out ld. It says this: -- -- [In the command description] The entry point of the output is the -- beginning of the first routine (unless the -e option is specified). -- -- [The description of -e] The following argument is taken to be the -- name of the entry point of the loaded program; location 0 is the -- default. -- -- Note that these two statements are not obviously consistent [....] -- the "first routine" would have to be at 0, hence its address would be -- at 0 - but that's NULL!) Nor do they account for main(). I see no -- reference at all to _start(). Your 4.2bsd documentation set (if you got it from Berkeley) refers to VAXen only, where the first routine *is* at location 0; the loader just loads sequentially. (C does *not* guarantee that address 0 doesn't contain anything, but that's another discussion.) The loader itself knows nothing about main or start; those are features of C, and have nothing to do with any other language that ld might load. You just might be writing in assembler.... -- Since you mentioned System V, I checked some AT&T 3B2 ld documentation I -- have here. The only references to the entry point are as follows: -- [....] -- Again, no reference to _start() - or to what the entry point would be if -- -e were left out. The documentation of the cc command doesn't say either. -- But note that your "simple" example, and hardwired entry point, are -- apparently NOT ld's doing, but cc's! The documentation seems to be lacking here. (I've never been very fond of ATTIS's rewritten documentation.) The definition of the C language really does require that main be the starting point; I suppose that didn't need to be part of the man page. start is not a user-visible feature, and certainly doesn't have to have that name. The hardwired entry point is a ld feature; the reference to main a feature of C. Nonetheless, the entry point will still be the first routine. Read on.... -- I tried your little foo program out on our local Celerity (4.2bsd). -- "cc foo" produces "Undefined: _main", and running the resulting a.out -- produces an immediate "Invalid address". However, a foo.o gets left -- around. So I did an "ld foo.o". This led to "Undefined: _printf". When cc invokes ld, it looks something like this: /bin/ld /lib/crt0.o foo.o -lc [-X flags and such left off] The crt0 file is loaded at address 0, and refers to main and exit. foo.o must satisfy main, and /lib/libc.a will satisfy printf and exit. -- Well, getting there. I tried -- "ld foo.o /lib/libc.a". No errors! Running a.out produces "Hello world", -- followed by an access violation. Adding an explict exit(0) fixes that. This is certainly not supported. You were lucky. It's dependent on the implementation of the exec(2) system call whether or not you'll get your command line arguments this way. -- [...] but are you -- still going to bet that the first routine WON'T end up as the entry point? I won't bet on anything if the loader isn't invoked properly.... -- At worst, I was claiming that a lot of non-portable C code got written -- under Unix (since K&R certainly contains nothing to indicate that -- there can be an entry point other than main()). And if you don't -- believe THAT, then you haven't looked at much Unix code. That code you've been looking at is going to have a hard time being ported to most UNIX systems then, much less any other system with a C compiler. I've been porting, adapting, and randomly mangling C code for UNIX from a variety of sources for years, and haven't run into a single program that doesn't have an entry point of main(). Would you refer me to such a program that I might have access to, like something from USENET, a USENIX tape, or a System V or BSD distribution? -- -- Jerry John Owens @ General Electric Company edison!jso%virginia@CSNet-Relay.ARPA [old arpa] edison!jso@virginia.EDU [w/ nameservers] jso@edison.UUCP [w/ uucp domains] {cbosgd allegra ncsu xanth}!uvacs!edison!jso [roll your own]
LEICHTER-JERRY@YALE.ARPA (06/26/86)
To: John Owens <edison!jso%virginia.csnet@CSNET-RELAY.ARPA>
In-Reply-To: John Owens <edison!jso%virginia.csnet@CSNET-RELAY.ARPA>, Mon, 23 Jun 86 09:34:42 edt
In general, I agree with what you say. A couple of small comments:
C does *not* guarantee that address 0 doesn't contain anything, but
that's another discussion.
C DOES guarantee that the integer constant 0, cast to any pointer type, will
never be equal to a pointer to any actual object of that type. In principle,
the cast could change the bit pattern; it almost never does - certainly it
does not on a VAX. Thus, _start == NULL. Most users will never see this,
but an implementer of _start() would. (Minor point, but the fact is there IS
an inconsistency - nothing keeps you from doing an extern void _start() and
looking at the resulting pointer.)
-- Well, getting there. I tried
-- "ld foo.o /lib/libc.a". No errors! Running a.out produces "Hello
-- world", followed by an access violation. Adding an explict exit(0)
-- fixes that.
This is certainly not supported. You were lucky. It's dependent on
the implementation of the exec(2) system call whether or not you'll
get your command line arguments this way.
Actually, I've since been informed that, while argc is passed correctly, argv
is screwy and envp isn't there at all.
-- [...] but are you
-- still going to bet that the first routine WON'T end up as the entry
point?
I won't bet on anything if the loader isn't invoked properly....
That gets to the crux of things: The "proper" way to invoke the loader is
undocumented - you must use cc. How then do you deal with a program written
in multiple languages? Basically, you ask a wizard....
I find it rather amusing that Unix, which (quite properly) argues for separate
modules with separate functions, and clean interfaces between them, glues the
loader and the C compiler together in a very ad hoc, undocumented way! (Side
comment: You at least understand what Unix is doing here. I had a couple of
other correspondents on this issue who had no real idea what was going on, and
ended up effectively claiming that the loader really is part of the compiler.
If that's the case, (a) it's going to be very hard to deal with multiple
compilers, ever; (b) it becomes hard to justify why the loader doesn't do more
to help the compiler/user out - e.g., check for type clashes in external
function calls. This would have been trivial to do if the implementers had
wanted to, with minimal overhead, and much faster than lint. Yes, it would
have required additional facilities in C - argument definitions as in ANSI C -
but then the language, compiler, and loader were developed by the same people
at the same time. As for those other correspondents, their lack of knowledge
didn't slow them down a bit in defending their incorrect religeous state-
ments....)
-- At worst, I was claiming that a lot of non-portable C code got written
-- under Unix (since K&R certainly contains nothing to indicate that
-- there can be an entry point other than main()). And if you don't
-- believe THAT, then you haven't looked at much Unix code.
That code you've been looking at is going to have a hard time being
ported to most UNIX systems then, much less any other system with a C
compiler. I've been porting, adapting, and randomly mangling C code
for UNIX from a variety of sources for years, and haven't run into a
single program that doesn't have an entry point of main(). Would you
refer me to such a program that I might have access to, like something
from USENET, a USENIX tape, or a System V or BSD distribution?
If you read more closely what I said, you'll see that I didn't claim to have
any examples of this kind of thing...I just claimed that, somewhere out
there, they were likely to exist. I know the people who did the VAX C com-
piler and run-time support, and they've tried really hard to be compatible
with Unix. Unfortunately, that can be very hard to do, since Unix programs
make use of a lot of undocumented "features". For example: There is abso-
lutely nothing in any definition of C that says that in:
f(a,b)
int a,b;
{ int *x;
x = &a + sizeof(int);
...
}
x will point to b. In a field-test version of VAX C V2.0, this was NOT true.
(The VMS procedure-call spec says that the argument list is owned by the
CALLING procedure, which may place it in read-only memory, re-use it, etc.;
the CALLED procedure may only read it. In that version of C, if you ever
took the address of a formal argument, the value passed was copied to a
temporary cell on entry, and the address you got was of the temporary. As
far as documented C semantics are concerned, this is a completely correct
implementation - but it prevents you from screwing with the caller's argument
list.) Anyway, cries of pain came from all over: Despite the existence of
varargs - which WAS provided with that release, BTW - it turns out that there
are LOTS of C programs that assume you can scan through an argument list this
way. So the final version of V2 put things back as they were, requiring a
waiver of conformance with this aspect of the procedure-call spec. (As it
happens, VAX C (currently) always builds argument lists on the stack and then
discards them, so you can screw around to your heart's content - but try it
with a FORTRAN caller, and things get really weird....)
Anyway, given that Unix programmers have historically grasped at ANYTHING they
can find the least justification for in the documentation - or no justifica-
tion at all - "compatibility" has to mean "put in EVERYTHING you can, even if
you can't think of anyone who's using it. Someone will come along who wants
it, some day...." Since the "entry point is the first routine" IS, in fact,
documented - even if only for wizards! - supporting it couldn't hurt....
John Owens @ General Electric Company
-- Jerry
-------
bzs%bu-cs.bu.edu@CSNET-RELAY.ARPA.UUCP (06/29/86)
>From: LEICHTER-JERRY@YALE.ARPA
...an attempt to explain _start(), ld, C, null pointers etc...
There are so many horrendous mistakes in this article it would take
a month to straighten it out.
Suffice it to say I simply hope no takes it seriously, he tries to
state things as if he knows what he is talking about, but he doesn't.
Perhaps he could post his article to INFO-C or net.lang.c and
find out how far off it is on almost everything.
Trust me folks, this is one to ignore.
-Barry Shein, Boston University
[Must I have the energy to go point by point just to warn readers? no.]
kaiser@FURILO.DEC.COM (Systems Consultant) (07/01/86)
>>From: LEICHTER-JERRY@YALE.ARPA >> ...an attempt to explain _start(), ld, C, null pointers etc... > >... >Trust me folks, this is one to ignore. > > -Barry Shein, Boston University > >[Must I have the energy to go point by point just to warn readers? no.] Thank you, Barry, for your extremely factual and informative comment. I'm afraid you have the wrong answer for your final question, however, which falls under the rule of "put up or shut up". Otherwise it's just libel. ---Pete Kaiser%furilo.dec@decwrl.dec.com decwrl!furilo.dec.com!kaiser DEC, 2 Iron Way (MRO3-3/G20), Marlboro MA 01752 617-467-4445
LEICHTER-JERRY@YALE.ARPA.UUCP (07/01/86)
>From: LEICHTER-JERRY@YALE.ARPA
...an attempt to explain _start(), ld, C, null pointers etc...
There are so many horrendous mistakes in this article it would take
a month to straighten it out.
Suffice it to say I simply hope no takes it seriously, he tries to
state things as if he knows what he is talking about, but he doesn't.
Perhaps he could post his article to INFO-C or net.lang.c and
find out how far off it is on almost everything.
Trust me folks, this is one to ignore.
-Barry Shein, Boston University
[Must I have the energy to go point by point just to warn readers? no.]
I don't know about the energy, you seem to have plenty of that. Things worth
saying, now, that's another issue.
When you've demonstrated any knowledge of anything related to this issue, I
might take you seriously. But I have yet to see anything from you beyond the
typical "Unix is the solution, everything else is just the problem".
Grow up.
-- Jerry
-------
jso@edison.UUCP (John Owens) (07/07/86)
I agree. Certainly there are many UNIX programs that take advantage of specific non-portable features! I wish that more people would use varargs, but old habits die hard. varargs being a fairly recent innovation, many people won't use it just because they're likely to find more UNIX implementations that'll work the old way than those with varargs, since everyone wants to stay compatible with existing programs! I finally went and read the start() routine, and was surprised to find that it was written partially in C, with a few "asm" directives. Many other UNIX implementations, such as the PDP-11 ones, have this written in assembler, with no public symbol for the first location. (The name _start comes from the fact that the C compiler prepends an underscore to any external symbol.) WRT mixing languages, I've found that just as cc will run as on any assembler files, then invoke the loader appropriately, f77 will invoke the c compiler on any .c files and the assembler on any .s files, etc. The (admittedly ad-hoc) solution has usually been to have the interface to your "new" langauge handle the linking with C modules, and any others you know about. Even without this, you can always do something like: cc a.c b.c c.c f77 d.f e.f mod2 f.m g.m -- and then, if your main program is C, with Fortran and Modula-2 routines -- cc a.o b.o c.o d.o e.o f.o g.o -- or if it's Fortran with C and Modula-2 routines -- f77 a.o b.o c.o d.o e.o f.o g.o -- etc, since any interface should just pass .o files on to ld -- One of the things I actually liked about VMS (really! it does have a few redeeming qualities) was that all the langauge interfaces were extremely standardized, to the point that the main rtl was common to all languages. Of course it didn't have to be so cumbersome an interface.... In hope of universitality- -John (edison!jso%virginia@CSNET-RELAY.ARPA)
jsdy@hadron.UUCP.UUCP (07/15/86)
Newsgroups: net.lang.c Subject: Re: Request for Comments Summary: Combatting nonsense! In article <870@bu-cs.UUCP> bzs@bu-cs.UUCP (Barry Shein) writes: >Any comments? This is taken from INFO-VAX (mod.computers.vax): >Path: bu-cs!harvard!caip!think!nike!styx!YALE.ARPA!LEICHTER-JERRY >From: LEICHTER-JERRY@YALE.ARPA >Subject: Re: main() and entry points in C >Date: Thu, 26-Jun-86 08:50:52 EDT > > ... Thus, _start == NULL. Most users will never see this, >but an implementer of _start() would. If some implementation of C uses _start instead of start as an entry point, they are asking for trouble. I have never seen a C-accessible symbol as an entry. The entry point, start, used to be almost always 0; but it isn't ever 0, now, in the latest releases of AT&T System V Unix. > -- still going to bet that the first routine WON'T end up as the entry > point? > I won't bet on anything if the loader isn't invoked properly.... >That gets to the crux of things: The "proper" way to invoke the loader is >undocumented - you must use cc. How then do you deal with a program written >in multiple languages? Basically, you ask a wizard.... On many versions of Unix, it is documented that properly written multiple-language modules can be compiled together by appropriately calling the compiler: cc myc.c myftn.o ... or f77 myftn.f myc.o ... or even f77 myftn.f myc.c ... > .. Since the "entry point is the first routine" IS, in fact, >documented - even if only for wizards! - supporting it couldn't hurt.... You claim this is a documented undocumented feature? It isn't even true. Specifically, never_called() { printf("Hi. Joe is wrong.\n"); } main() { printf("Hello world.\n"); } will never call me a liar. Even if you are talking about straight link-loaded objects (with no header to call _main()), several common loaders allow one to specify entry points other than 0. These include, but are not! limited to, the AT&T System V ld (again); the CULC Fortran IV Plus linker; and the DEC VMS linker. The C header, which cc puts at the head of the compiled objects, contains the entry point -- which is not at location 0 under all versions of Unix! This header moves the argc, argv, and envp to a location that main() will understand when called as a function; then calls main() as a function; then calls exit() as a function whose argument is main()'s return value; and if this doesn't exit, typically tries to perform an exit system call itself, and then (in desperation) a halt. I should say it TYPICALLY does all this. If it ain't documented (and especially if it's not in the standard), it ain't guaranteed, so don't bet the lunch money, Mildred. -- Joe Yao hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP} jsdy@hadron.COM (not yet domainised)
LEICHTER-JERRY@YALE.ARPA.UUCP (07/18/86)
I received a private copy of Joe Yao's message, with no indication that it had also been forwarded to info-vax. The following is the response I sent him, slightly amended. For info-vax readers, there is some repetition here; my apologies. Unless someone brings up startling new facts, this is my last message on this subject. I promise. :-) -- Jerry If some implementation of C uses _start instead of start as an entry point, they are asking for trouble. I have never seen a C-accessible symbol as an entry. The entry point, start, used to be almost always 0; but it isn't ever 0, now, in the latest releases of AT&T System V Unix. Try the Celerity 4.2bsd port - their entry point is BOTH start (no "_") and something like _crt0_start. The latter IS accessible from C - and has a value of 0. I'm told that this is not uncommon, although I can't name any other examples. I DO know that the Pyramid port - at least the 4.2bsd universe version, I didn't try the System V one - uses start (no "_") only. On many versions of Unix, it is documented that properly written multiple-language modules can be compiled together by appropriately calling the compiler: cc myc.c myftn.o ... or f77 myftn.f myc.o ... or even f77 myftn.f myc.c ... "Many versions" of Unix? I thought we were talking about one portable OS here! :-) You claim this is a documented undocumented feature? It isn't even true. Specifically, never_called() { printf("Hi. Joe is wrong.\n"); } main() { printf("Hello world.\n"); } will never call me a liar. Even if you are talking about straight link-loaded objects (with no header to call _main()), several common loaders allow one to specify entry points other than 0. These include, but are not! limited to, the AT&T System V ld (again); the CULC Fortran IV Plus linker; and the DEC VMS linker. Mr. Shein forwarded only the last of a series of messages on this topic. He did this because he was interested in flames, not facts. (Mr. Shein sent me a note saying he would forward my message, "without editorial comment," to "the C experts" on info-c/net.lang.c. His interpretation of "without editorial comment" allowed him to include a "summary line" of "Combating nonsense". My, my. Mr. Yao's response is the first one, after about 2 weeks, that actually talks about the issues - there were a couple of others pointing out related bugs in various microprocessor C implementations. There's been nothing further from Mr. Shein.) The whole discussion started with a question from a VMS C user as to why, if he didn't provide a main() routine, his VMS C code started at the first function seen by the VMS linker. (Well, that wasn't QUITE his question - others asked that, the original questions shows up below.) As my original response pointed out, the Unix linker does exactly the same thing, if you invoke it "straight"; it's only because of the way cc chooses to invoke the linker that stuff works out. If you do things the "supported" way in VAX C, you get exactly the same results as if you do them the "supported" way on Unix C. But, in fact, even if you go beyond the "supported" approaches, the results on VMS are a pretty close approximation to what you get on Unix. The C header, which cc puts at the head of the compiled objects, contains the entry point -- which is not at location 0 under all versions of Unix! This header moves the argc, argv, and envp to a location that main() will understand when called as a function; then calls main() as a function; then calls exit() as a function whose argument is main()'s return value; and if this doesn't exit, typically tries to perform an exit system call itself, and then (in desperation) a halt. Again, all of these points were brought up in messages that Mr. Shein chose to omit. Interestingly enough, the original message complained that, with previous versions of VMS C, a program without a main(), called at its first function, received its command line arguments - but that in recent versions this didn't work "unlike Unix". In fact, this does NOT work in Unix! I should say it TYPICALLY does all this. If it ain't documented (and especially if it's not in the standard), it ain't guaranteed, so don't bet the lunch money, Mildred. K&R defines the semantics of C programs. They provide a definition for how C programs start up (in main()). They also provide a definition for how functions get called. Nowhere is there a definition of the semantics of a complete program WITHOUT a main() function - implementations are on their own here. The VMS implementation tries to emulate what most Unix implementations seem to do. It comes pretty close. Joe Yao hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP} jsdy@hadron.COM (not yet domainised) -- Jerry -------