[comp.unix.internals] Loading and Executing Object Code at Runtime

sef@kithrup.COM (Sean Eric Fagan) (02/16/91)

In article <6073@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
>What's "data space", and how is it different from any other sort of
>space?  (Most UNIX systems run with a flat address space on 386es.  PTEs
>on the 386 only have a "writable" bit.)

Xenix had split I&D for the '386, I believe.  As for '386 unices, even those
with "flat" address spaces don't really have them; what they do is set cs
and ds (and es, and ss) to point to the same memory range.  *However*, you
still cannot execute data; you have to execute code.  Consider it as an
alias of forms.

How does this affect people?  Well, consider the following code, which is
somewhat similar to code I ran into recently:

	push	byte1
	push	byte2
	; ...
	call	@esp

Oops.  ss is a readable and writable segment, not an executable segment.
Memory-fault, core-dump.  (Note:  to make it work, all you have to do is
spit out a segment prefix [a la 'call cs:@esp'].)

Anyway, just a bunch of nit-picking, because I can't fall asleep yet...

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

bzs@world.std.com (Barry Shein) (02/17/91)

From: sef@kithrup.COM (Sean Eric Fagan)
>Oops.  ss is a readable and writable segment, not an executable segment.
>Memory-fault, core-dump.  (Note:  to make it work, all you have to do is
>spit out a segment prefix [a la 'call cs:@esp'].)
>
>Anyway, just a bunch of nit-picking, because I can't fall asleep yet...

Point of information!

So what you're saying is that an (assembler, library) function could
be written which calls a data address and used by any program (on a
386)? Something similar to indir(), eg: call(addr,arg1,arg2,...,argn)?
-- 
        -Barry Shein

Software Tool & Die    | bzs@world.std.com          | uunet!world!bzs
Purveyors to the Trade | Voice: 617-739-0202        | Login: 617-739-WRLD

cpcahil@virtech.uucp (Conor P. Cahill) (02/17/91)

sef@kithrup.COM (Sean Eric Fagan) writes:
>*However*, you still cannot execute data; you have to execute code. 
> Consider it as an alias of forms.

Obviously you cannot execute data since it probably doesn't make much
sense as a stream of instructions.  However, if you copied a function from
code to data space and then branched throught a pointer to that data area,
it does work.  So you can execute from data space.  This works on ISC UNIX,
Bell Tech UNIX, Sun OS and several other OS's.  I don't have SCO lying
around to try, but I would bet that it does in fact work.

Here is a sample program that will verify that it works:

Two notes about the program:

	1. Yes all error checking has been removed.  I'm
	2. Yes I know that it uses non-portable stuff.


main() {
	char * addr; char test[100]; char * malloc();
	int func1(); int func2(); int (*funcp)();

	strcpy(test,"YES will appear here:     if it worked\n");

	addr = malloc(3000);
	docopy(addr,func1,func2);
	funcp = addr;		/* you will get a warning about this line */
	(*funcp)(test);
	puts(test);
	exit(0);
}
docopy(tgt,src,srcend) 
	char *tgt; char*src; char *srcend; 
{
	while( src != srcend )
		*tgt++ = *src++;
}
int func1(str) char * str; { str[22] = 'Y'; str[23] = 'E'; str[24] = 'S';}
int func2(str) char * str; { str[22] = 'N'; str[23] = 'O'; }
-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170 

sef@kithrup.COM (Sean Eric Fagan) (02/17/91)

In article <1991Feb16.163527.25147@virtech.uucp> cpcahil@virtech.uucp (Conor P. Cahill) writes:
>	(*funcp)(test);

This is, if you will pardon the hand waving, "different."  This is an idiom
the compiler knows about, and it spits out the correct code.  Specifically,
it spits out code that uses cs, not ds.  On the '386, not matter how hard
you try, you cannot execute something in a writable segment!  The execute
bit and the writable bit are mutually exclusive (and if I had my '386 book
here, I'd remember why; I think they are the same bit or something weird
like that).  *However*:  you *can* alias two or more segments, and use
segment prefixes.  But unless you've done that, you cannot execute out of
your data space.

Fortunately (or otherwise), all '386 unices I've seen (except for '386
xenix) have a nice, flat address space, even though there are still two
segments.

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

sef@kithrup.COM (Sean Eric Fagan) (02/17/91)

In article <BZS.91Feb16112944@world.std.com> bzs@world.std.com (Barry Shein) writes:
>So what you're saying is that an (assembler, library) function could
>be written which calls a data address and used by any program (on a
>386)? Something similar to indir(), eg: call(addr,arg1,arg2,...,argn)?

*If* cs and ds (and ss, and es, at least) are aliased to the same chunk of
virtual memory, then, yes.  All indir has to do is look like this:

	call(void (*addr)()) {
		(*addr)();
	}

The compiler will spit out code that will work.  The case I had, again,
dealt with the compiler spitting out code that looked like

	call *%esp

On the '386, both ebp and esp use ss by default; and since ss is writable,
it cannot be executable.  What I have to do is get it to spit out a
segment-override prefix (namely, "cs:" 8-)).  (One thing I do have to make
sure about:  that spitting the "cs:" out does not cause a far call; that
would be *bad*.)

On the other hand, if you do not have cs aliased to ds (et al), then you
will be jumping to the wrong address.  (It will be the offset in the segment
you want, but in the wrong segment.)

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

guy@auspex.auspex.com (Guy Harris) (02/18/91)

>>	(*funcp)(test);
>
>This is, if you will pardon the hand waving, "different."  This is an idiom
>the compiler knows about, and it spits out the correct code.

I suspect just about *any* software that let you load and execute object
code at runtime from C would use said idiom - i.e., you ask it "load up
thus-and-such an object file", and then ask it "here's a function name,
give me a pointer to that function", and then call the function through
that pointer - so I suspect that just about *any* software that would
let you load and execute object code at runtime wouldn't be bitten by
that problem on a *86.  (For instance, the "dlopen()"/"dlsym()" code in
S5R4 works that way, and I suspect its moral equivalent in OSF/1 does so
as well.)

meissner@osf.org (Michael Meissner) (02/18/91)

In article <1991Feb16.163527.25147@virtech.uucp> cpcahil@virtech.uucp
(Conor P. Cahill) writes:

| Obviously you cannot execute data since it probably doesn't make much
| sense as a stream of instructions.  However, if you copied a function from
| code to data space and then branched throught a pointer to that data area,
| it does work.  So you can execute from data space.  This works on ISC UNIX,
| Bell Tech UNIX, Sun OS and several other OS's.  I don't have SCO lying
| around to try, but I would bet that it does in fact work.

This doesn't always work.  On 88k systems, the 88Open standard
mandates that sections with execute access enabled do not have write
access enabled and reverse holds as well.  You have to do a system
call (memctl) to change access modes.  The reason for this is the fact
that the 88k is a 'Harvard' architecture, and has separate caches for
instructions and data.  Thus, even if you can write into an area and
jump to it, the cache may contain invalid data/instructions, because
it has no idea the memory changed underneath it.....

Another potential problem is PC-relative addressing.  If you move a
code fragment that refers to an external/static memory address, the
code fragment in the new location will reference a different piece of
memory.
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

Considering the flames and intolerance, shouldn't USENET be spelled ABUSENET?

gary@neptune.ctc.contel.com (Gary Bisaga x4219) (05/22/91)

In article <1991Feb16.213056.2632@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes:
> In article <1991Feb16.163527.25147@virtech.uucp> cpcahil@virtech.uucp (Conor P. Cahill) writes:
> >	(*funcp)(test);
> 
> This is, if you will pardon the hand waving, "different."  This is an idiom
> the compiler knows about, and it spits out the correct code.  Specifically,
> it spits out code that uses cs, not ds.  On the '386, not matter how hard
> you try, you cannot execute something in a writable segment!  The execute
> bit and the writable bit are mutually exclusive (and if I had my '386 book
> here, I'd remember why; I think they are the same bit or something weird
> like that).  *However*:  you *can* alias two or more segments, and use
> segment prefixes.  But unless you've done that, you cannot execute out of
> your data space.
You cannot execute a writable segment because there is one bit that means either
(a) readable, or (b) writable, depending on whether the segment is executable or
not, respectively, which is decided by another bit.  And of course, you're right,
aliasing is the way this type of thing is normally done.

Gary Bisaga (gary@ctc.contel.com)