[comp.lang.c] _why_ does the UNIX linker not distinguish text and data addresses???

levy@mtcchi.uucp (2656-Daniel R. Levy(0000000)0000) (07/30/90)

Compile these two files together on a UNIX system:

::::::::::::::
a.c
::::::::::::::

main()
{
	(void) bogus();
	return 0;
}

::::::::::::::
b.c
::::::::::::::
int bogus;

$ cc -o ab a.c b.c
a.c:
b.c:
Linking:
$ ./ab
Illegal instruction - core dumped

So far I have tried this on a 3B2 (SVR3.1), a Sun4 (SunOS 4.0.3), and an
Amdahl (UTS 5.2.6b).  All compile and link similarly quietly.

Now my question is, why does the linker silently resolve the function reference
to the global variable without even a whisper of a warning?  Or, to partially
answer my own question, why doesn't the compiler put something in the symbol
table for a.c saying "hey, I expect 'bogus' to be a text address"?  (True,
this is obvious from the use of the symbol in the code, but the linker doesn't
examine the code, it just patches it according to the symbol table, right?)

Yes I know this is a stupid thing to do, I'm just asking hypothetically.
"lint" does identify the problem though in the same rather generic terms
("value type declared inconsistently") that it applies to more harmless
discrepancies between closely related datatypes.

True, the linker has to allow for tricks such as putting the strings of a
program into text space through diddling of the assembly (to cut down
on data space, which is not shared among processes) and dynamic linking of
code.  But why can't there be special assembler directives for these so the
linker can still catch boo-boos like that illustrated?
-- 
 Daniel R. Levy * Memorex Telex * Naperville IL * ..!uunet!tellab5!mtcchi!levy
So far as I can remember, there is not one      ... therefore be ye as shrewd
word in the Gospels in praise of intelligence.  as serpents <Gen. 3> and harm-
-- Bertrand Russell [Berkeley UNIX fortune]     less as doves -- God [MT. 10:16]

steve@taumet.com (Stephen Clamage) (07/31/90)

levy@mtcchi.uucp (Daniel R. Levy) writes:

|::::::::::::::
|a.c
|::::::::::::::
|main()
|{
|	(void) bogus();
|}

|::::::::::::::
|b.c
|::::::::::::::
|int bogus;

Traditional Unix object-file formats, and thus the linkers, are very
simple-minded.  They certainly could distinguish between text and
data addresses, but simply don't.  It is up to the programmer, even
in ANSI C, to take care that extern declarations/definitions in separate
compilation units match.  Lint will generally pick up mismatches.

In C++, BTW, this example will fail at link time no matter what linker
is used, since the type of the function is encoded by the compiler into
its true external name.
-- 

Steve Clamage, TauMetric Corp, steve@taumet.com

edward@ucbarpa.Berkeley.EDU (Edward Wang) (08/01/90)

Well, it allows data to be executed.  Rather, the difference
between text and data is that text can be read-only and shared,
not that data is not executable.  Given that, it would be
incorrect for the linker to signal an error.

The only impact this has on the language is that global
variables and functions must share the same name space.
The compiler makes no judgement on where you put functions
or variables.  With -R, variables can end up in the text
segment.  Compiled C code can be linked with hand-written,
even self-modifying, assembly code.  Therefore, it would be
incorrect for the compiler to specify where an undefined
symbol should come from.

Your program has a C error, no different from declaring
a variable as an int in one place and as a float somewhere else.
True, the compiler should catch it, but the current organization
makes it difficult.  (Use lint.)  The correct C code
(* (void (*)()) &bogus)() generates the same binary.
There's no way for the linker to tell.

Anyway, it's not the linker's fault.  It's the compiler's fault.
If you consider lint to be part of the compiler, then it's your fault.

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/01/90)

In article <1990Jul30.104726.22660@mtcchi.uucp> levy@mtcchi.UUCP (Daniel R. Levy) writes:
>Now my question is, why does the linker silently resolve the function reference
>to the global variable without even a whisper of a warning?

The original UNIX linker "ld", written I think by Dennis Ritchie,
was remarkably simple compared to typical system linkers.  This
has since been fixed in UNIX System V COFF-based systems.

>Yes I know this is a stupid thing to do, I'm just asking hypothetically.

UNIX was never designed to keep people from doing stupid things,
because that policy would also keep them from doing clever things.

aryeh@eddie.mit.edu (Aryeh M. Weiss) (08/02/90)

In article <37909@ucbvax.BERKELEY.EDU> edward@ucbarpa.Berkeley.EDU.UUCP (Edward Wang) writes:
>Well, it allows data to be executed.  Rather, the difference
>between text and data is that text can be read-only and shared,
>not that data is not executable.  Given that, it would be
>incorrect for the linker to signal an error.
>

Under SCO Xenix 386 with the SCO linker, this does produce an error.
This is because of the segmented memory model used on the Intel
processors.  (The linker bombs with a `fixup' error.)  Text is text,
data is data, and never the twain as they say.  Also data is not
executable, at least by default.  The exception is when building small
model impure 8086/80286 programs, the linker does not produce an error.

--