mjy@sdti.UUCP (Michael J. Young) (01/21/88)
Is there a way in Unix to create an "alias" between the text and data segments of a process? More specifically, how does one go about executing a block of code that was generated in a data segment? I'm not really talking about self-modifying code, in which a program attempts to modify its own text segment, but rather self-generating code, in which a program "compiles" a block of code into its data segment (created via malloc() perhaps?) and then tries to execute it. An obvious application of this might be an incremental compiler, but I can think of other reasons why I might want to do this as well. -- Mike Young - Software Development Technologies, Inc., Sudbury MA 01776 UUCP : {decvax,harvard,linus,mit-eddie}!necntc!necis!mrst!sdti!mjy Internet : mjy%sdti.uucp@harvard.harvard.edu Tel: +1 617 443 5779
alex@umbc3.UMD.EDU (Alex S. Crain) (01/21/88)
In article <202@sdti.UUCP> mjy@sdti.UUCP (Michael J. Young) writes: > >Is there a way in Unix to create an "alias" between the text and data >segments of a process? More specifically, how does one go about executing a >block of code that was generated in a data segment? *** SYSTEM 5 **** Koyto Common Lisp does this when it loads object code. The system builds object files where the first symbol in the text segment is a function that knows about all the other symbols in the file. There is an external loader that makes a copy of the .o file and resolves all external symbols against the lisp executable's symbol table. Lisp allocates space with brk(), and loads the .o file as data, and then branches to the start of the text area of the .o file, assuming that there is a function there that will put the rest of the symbols on the common obstack. Boy, do things get weird when the .o file is corrupted :-). But to answer the question, Nothing. That is, lisp doesn't do anything special to accomplish this, it just works. There is a short file that demonstrates this behavior, which I can send you if you like. -- :alex. alex@umbc3.umd.edu
gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/22/88)
In article <202@sdti.UUCP> mjy@sdti.UUCP (Michael J. Young) writes: >in which a program "compiles" a block of code into its data segment (created >via malloc() perhaps?) and then tries to execute it. In general, there is no support for doing this. Some specific implementations may have the necessary hooks, if the architecture permits it. There are several obvious alternative portable approaches to doing something of this general nature. Which to pursue would depend on what you're really trying to accomplish, functionally.
gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/22/88)
In article <730@umbc3.UMD.EDU> alex@umbc3.UMD.EDU (Alex S. Crain) writes: >loads the .o file as data, and then branches to the start of the text area >of the .o file This cannot possibly work on an architecture that enforces the distinction between Instruction and Data spaces.
ed@mtxinu.UUCP (Ed Gould) (01/22/88)
>Is there a way in Unix to create an "alias" between the text and data >segments of a process? More specifically, how does one go about executing a >block of code that was generated in a data segment? It depends on the hardware architecture and on the implementation of the operating system. Some hardware (e.g., some PDP-11 models) allows for a strict separation of instruction ("text") and data spaces. On such a machine, if the OS uses the feature, then (unless the text space is writable, which it typically isn't) you're out of luck. Other machines, like the VAX, do not rigidly separate instruction and data spaces. Even in this class of machinem though, there may be pitfals: If the hardware has separate read and execute permissions on regions of memory, then the same problem arises. If the OS does not supply execute permission for the data segment, then code can't be executed from there. -- Ed Gould mt Xinu, 2560 Ninth St., Berkeley, CA 94710 USA {ucbvax,uunet}!mtxinu!ed +1 415 644 0146 "`She's smart, for a woman, wonder how she got that way'..."
david@linc.cis.upenn.edu (David Feldman) (01/22/88)
You can execute out of the data segment, at least on SOME Unix systems. In Ultrix, you can tell the loader to make the code "IMPURE", although with cc you usually get demand paged pure executables unless you specify the right option for ld. You can also execute code out of the stack, of course, and if you catch signals you are forced to do this. On receiving a signal, Ultrix inserts a segment of code above the stack in the stack space - on a VAX at least. This code is the infamous 'sigtramp'. So, yes, a program can be modified while it is running. As an aside, I had planned on writing a machine simulator which executed code out of a malloc'ed memory space. I never started the project, but I was able to get some assembly running that jumped into a malloc space and then out again. I would assume that any Unix running on a machine that does not enforce separate I & D could do this. Check the manual page for ld. Dave F. david@linc.cis.upenn.edu
jc@minya.UUCP (01/23/88)
In article <7156@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes: > In article <730@umbc3.UMD.EDU> alex@umbc3.UMD.EDU (Alex S. Crain) writes: > >loads the .o file as data, and then branches to the start of the text area > >of the .o file > > This cannot possibly work on an architecture that enforces the > distinction between Instruction and Data spaces. Jeez, why do they let such obvious non-wizards post responses to unix.wizards? (:-) There have been far too many such comments from people who obviously haven't RTFM, in this case K&R. Study the following program, which should work anywhere you have a C compiler. (If your compiler doesn't do it right, send it back to the factory; it's obviously broken.) | #include <stdio.h> | char *code; /* This can point to any address in memory */ | int (*fct)(); /* We will point this at *code */ | | foobar(x,y) | { printf("foobar(%d,%d)\n",x,y); | return x + y; | } | main() | { int i; | | code = (char*)foobar; /* This could be malloc() */ | fct = (int(*)())code; /* Stuff random pointer into fct */ | i = (*fct)(7,9); /* Call random memory location */ | printf("(*fct)(7,9)=%d\n",i); | exit(0); | } Well, OK, he asked if there is Unix support, and there isn't. So who needs it? This oughta work on VMS or MS/DOS, too. -- John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)
jc@minya.UUCP (01/23/88)
You know, it occurred to me that the I-space vs D-space just might be a problem, so I looked in the manuals for this machine (which might as well remain incognito). Sure enough, there is supposedly separate address spaces for both, as well as for IO-space. So I modified the little program I posted earlier: | #include <stdio.h> | extern char *malloc(); | char *code; /* This will point into D-space */ | int (*fct)(); /* So will this */ | | foobar(x,y) | { printf("foobar(%d,%d)\n",x,y); | return x + y; | } | main() | { int i; | char *p, *q; | | code = (char*)malloc(1000); | p = code; | q = (char*)foobar; | while (p < code+1000) /* Make copy of foobar() in code[] */ | *p++ = *q++; | fct = (int(*)())code; /* Point to the malloc()ed data area */ | i = (*fct)(7,9); /* Call the copy */ | printf("(*fct)(7,9)=%d\n",i); | exit(0); | } It compiled and linked without problems, so with a bit of trepidation I told Unix (Sys/V) to run it, and guess what it said? Give up? OK, here's the output: | foobar(7,9) | (*fct)(7,9)=16 Seems like it worked, didn't it? Clever people, those C compiler writers! -- John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)
gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/25/88)
In article <452@minya.UUCP> jc@minya.UUCP (John Chambers) writes: >In article <7156@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes: >> In article <730@umbc3.UMD.EDU> alex@umbc3.UMD.EDU (Alex S. Crain) writes: >> >loads the .o file as data, and then branches to the start of the text area >> >of the .o file >> This cannot possibly work on an architecture that enforces the >> distinction between Instruction and Data spaces. >Jeez, why do they let such obvious non-wizards post responses to >unix.wizards? (:-) There have been far too many such comments from >people who obviously haven't RTFM, in this case K&R. This issue has nothing to do with K&R. It has to do with hardware realities. If the I&D space distinction is enforced, as it is for example using "cc -i" on PDP-11s, then it is indeed impossible to execute anything out of data space. In fact, for such PDP-11s, the same range of addresses mean two totally different things, depending on whether data is being accessed or an instruction is being fetched for execution. >Study the following program, which should work anywhere you have >a C compiler. Your example takes an I-space address, stashes it in a pointer (of inappropriate type, but that's not the issue here), then invokes an already-compiled function (which lives in I-space) using it. Of COURSE you can invoke an I-space function via a pointer. That is NOT AT ALL the same as what was requested, which was to invoke a portion of D-space as a function. THAT cannot be done on a split=I&D PDP-11, for example. Different physical memory locations correspond to an I-space virtual address and the SAME NUMERICAL VALUE as a D-space virtual address. If you still don't understand this, go find a split I&D PDP-11 and play with it for a while, or contact me for clarification, rather than spreading erroneous information across the net.
gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/25/88)
In article <453@minya.UUCP> jc@minya.UUCP (John Chambers) writes: >Sure enough, there is supposedly separate address spaces for both... You have to ask the linker for this feature, assuming your hardware and OS support it; it's not the default. >| code = (char*)malloc(1000); code -> D-space. >| fct = (int(*)())code; /* Point to the malloc()ed data area */ >| i = (*fct)(7,9); /* Call the copy */ If this works, you do not have split I&D space.
rbj@icst-cmr.arpa (Root Boy Jim) (01/26/88)
From: Doug Gwyn <gwyn@brl-smoke.arpa> In article <730@umbc3.UMD.EDU> alex@umbc3.UMD.EDU (Alex S. Crain) writes: >loads the .o file as data, and then branches to the start of the text area >of the .o file This cannot possibly work on an architecture that enforces the distinction between Instruction and Data spaces. While true, most such machines do not *insist* on enforcing the distinction, or provide mechanisms around it where appropriate. Thus is is possible to build three or four types of executables on a given system: (1) Text and Data glommed together with only limits protection. (2) Text sharable (and thus unmodifyable), data separate & possibly executable. (3) Text and Data separate in separate I/D spaces. Here they share the same address range and as Doug mentions, never the twain shall meet. (4) Demand paged, which is more or less like (2) above. Each format has its own advantages and drawbacks. If you want to set breakpoints in code, you must use the first type. If you want to dynamically load code, you must either use this format, or execute from data space if possible. This kind of stuff tends to vary across machines and systems. The split I/D space PDP-11 is perhaps a bad example, as it is (or was) possible to build other format executables, but I'm sure Doug *has* seen machines where this is impossible. (Root Boy) Jim Cottrell <rbj@icst-cmr.arpa> National Bureau of Standards Flamer's Hotline: (301) 975-5688 I feel like a wet parking meter on Darvon!
mjy@sdti.UUCP (Michael J. Young) (01/28/88)
In article <11476@brl-adm.ARPA> rbj@icst-cmr.arpa (Root Boy Jim) writes: >(1) Text and Data glommed together with only limits protection. > ... >Each format has its own advantages and drawbacks. If you want to set >breakpoints in code, you must use the first type. Actually, setting breakpoints is no problem even with the other three types. That's what the ptrace(2) system call is for. I suppose you could even use ptrace() to "poke" an entire function into the text space. The problem arises when you want to change the size of the text region; ptrace() doesn't let you do that. -- Mike Young - Software Development Technologies, Inc., Sudbury MA 01776 UUCP : {decvax,harvard,linus,mit-eddie}!necntc!necis!mrst!sdti!mjy Internet : mjy%sdti.uucp@harvard.harvard.edu Tel: +1 617 443 5779
naren@vcvax1.UUCP (naren) (01/28/88)
> In article <7156@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes: > > In article <730@umbc3.UMD.EDU> alex@umbc3.UMD.EDU (Alex S. Crain) writes: > > >loads the .o file as data, and then branches to the start of the text area > > >of the .o file > > > > This cannot possibly work on an architecture that enforces the > > distinction between Instruction and Data spaces. > > Jeez, why do they let such obvious non-wizards post responses to > unix.wizards? (:-) There have been far too many such comments from > people who obviously haven't RTFM, in this case K&R. > > [Sample program that malloc()'s and typecasts result to a func. ptr. deleted] > > John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393) Doug Gwyn is right about architectures that enforce distinctions between code and data spaces (ex: 80386). On UNIX/386, an sbrk() allocates space in the Data Segment of the process. Type casting this pointer and issuing a 'call' to this address will result in a protection exception. Now, if you REALLY want to do this, you could write a new system call like mktext(vaddr, length) where vaddr is the start of the data space you would like to fill in with code. mktext() would just create a new code segment descriptor in the LDT of your task that includes the desired section of data space and then you'd be all set. I am of course leaving out a lot of the nitty-gritty details of how this feature would interact with other things like shared texts, etc. ...!{harvard,mit-eddie}!cybvax0!vcvax1!naren Naren Nachiappan.(617/661-1230)
weiser.pa@xerox.com (01/28/88)
"...in which a program "compiles" a block of code into its data segment (created via malloc() perhaps?) and then tries to execute it. " Just do it. Works fine on Suns and Vaxes. -mark
jc@minya.UUCP (John Chambers) (01/28/88)
In article <11476@brl-adm.ARPA>, rbj@icst-cmr.arpa (Root Boy Jim) writes: > From: Doug Gwyn <gwyn@brl-smoke.arpa> > > This cannot possibly work on an architecture that enforces the > > distinction between Instruction and Data spaces. > While true, most such machines do not *insist* on enforcing the distinction, > or provide mechanisms around it where appropriate. Thus is is possible to > build three or four types of executables on a given system: > > [examples deleted] Jeez, what a turkey! Here I was enjoying all the flames I was getting from people telling me what a fool I was thinking that my examples might work on machines with separate I and D spaces, and you had to go post descriptions of how it might be implemented. Now I'm going to have to find some other, much less entertaining stuff to read. At least I have a few good SF books to turn to. BTW, do you recall back in the early days of the obscure-C contest, there was a cute entry that started: short main[] = { followed by a jumbled list of numbers in various formats? It was a program that ran on PDP-11s and VAXen and did something reasonably silly. Anyhow, I tried it out on a PDP-11/75 that I had handy. The machine definitely had separate I and D spaces, and the program quite definitely worked. I didn't tell the compiler anything special, and I doubt the linker recognized that _main was special and belonged in I space. But neither the compiler nor the linker was fazed by having main as a data array. So far, I haven't heard from anyone that has claimed to try either of my posted examples and found them not to work. I was really hoping I'd get responses from people telling me where they failed and how. Well, maybe if I wait long enough, I'll learn something. [I do know personally of one commercial system where breakpoints don't work because I space is unwritable; I won't let on which one, in hopes I'll learn of more.] On to the SF books... -- John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)
wolfgang@mgm.mit.edu (Wolfgang Rupprecht) (01/29/88)
In article <207@sdti.UUCP> mjy@sdti.UUCP (0000-Michael J. Young) writes: >Actually, setting breakpoints is no problem even with the other three types. >That's what the ptrace(2) system call is for. If you have a shared-text type of executable, you can't guarentee ptrace-ability. If someone else is executing the same text, the system is forced to deny you write-permission with the error 'text busy'. (Otherwise the other process would also get hit with the breakpoints.) The other process can still be corrupted however, if it is started *after* you insert the breakpoints. Now it gets amusing, since you can't *remove* the breakpoints once the other process is started. -- Wolfgang Rupprecht ARPA: wolfgang@mgm.mit.edu (IP 18.82.0.114) Freelance Consultant UUCP: mit-eddie!mgm.mit.edu!wolfgang Boston, Ma. VOICE: Hey_Wolfgang!_(617)_267-4365
gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/30/88)
In article <749@umbc3.UMD.EDU> alex@umbc3.UMD.EDU (Alex S. Crain) writes: >And if you can do it on your hardware, the method described will work. But it may not work next year, even on the same hardware but especially if you're porting to another system. If these are valid concerns, then he should find another way to accomplish what he's after.
gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/30/88)
In article <459@minya.UUCP> jc@minya.UUCP (John Chambers) writes: >> From: Doug Gwyn <gwyn@brl-smoke.arpa> >> > This cannot possibly work on an architecture that enforces the >> > distinction between Instruction and Data spaces. >I tried it out on a PDP-11/75 that I had handy. The machine definitely had >separate I and D spaces, and the program quite definitely worked. I didn't >tell the compiler anything special, and I doubt the linker recognized that >_main was special and belonged in I space. But neither the compiler nor the >linker was fazed by having main as a data array. You don't listen very well, do you? Just because the underlying hardware CAN enforce the distinction between I&D space doesn't mean that it always DOES so. In fact, the usual UNIX C implementation for a PDP-11 defaults to a single shared address space, and only programs that need more space (such as the f77 compiler) request split I&D spaces by specifying the cc -i option when linking. Try running these example programs with I&D separation enforced, AS I SPECIFIED, and see what happens. As someone (rbj?) said, the PDP-11 isn't the best example, due to its being possible to set it up to blur the I&D distinction by default. (Some cheaper models couldn't be set up to enforce the distinction!) I used the PDP-11 as an example because it seemed the machine you were most likely to have access to. I have seen segment-based architectures (Burroughs B5500 comes to mind) where the default behavior is to enforce the distinction, and I would be very surprised if IBM's System/38 or the H-P 3000 didn't also do so. The way to understand an issue is not to resort to blind experiments.
gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/30/88)
In article <246@vcvax1.UUCP> naren@vcvax1.UUCP (naren) writes: > Now, if you REALLY want to do this, you could write a new system call >like mktext(vaddr, length)... In case it isn't obvious to everyone, the reason why this can be done is that the operating system kernel has special privileges and can therefore shuffle around virtual->physical address mappings and associated attributes, but an ordinary user-mode process cannot do this itself. That's why IPC via shared memory requires kernel support, for example.
mjy@sdti.UUCP (Michael J. Young) (02/02/88)
In article <246@vcvax1.UUCP> naren@vcvax1.UUCP (naren) writes: > Doug Gwyn is right about architectures that enforce distinctions >between code and data spaces (ex: 80386). On UNIX/386, an sbrk() allocates >space in the Data Segment of the process. Type casting this pointer and >issuing a 'call' to this address will result in a protection exception. This happens on many other 80x86 ports as well. Microport (the only 286 port I'm familiar with) enforces separation between text and data regions as well. Unfortunately, they don't seem to provide ld(1) options to override the protection. I received an email reply from T. Andrews, who said that Xenix/286 provides a service and an ld(1) option to support this, but I have no personal experience with it. > Now, if you REALLY want to do this, you could write a new system call >like mktext(vaddr, length) where vaddr is the start of the data space >you would like to fill in with code. mktext() would just create a new code >segment descriptor in the LDT of your task that includes the desired >section of data space and then you'd be all set. > I am of course leaving out a lot of the nitty-gritty details of >how this feature would interact with other things like shared texts, etc. ... and where to get the money to buy a source license! :-) Seriously, though. It seems to me that any implementation of Unix that enforces separation should also provide a means around it, preferably in a portable manner. Does POSIX address this issue? On systems that enforce separation of text and data, with no means of "turning it off", it seems you are forced into using exec(2). Can you imagine trying to implement an incremental compiler where each new function you create has to have its own a.out and be its own process? -- Mike Young - Software Development Technologies, Inc., Sudbury MA 01776 UUCP : {decvax,harvard,linus,mit-eddie}!necntc!necis!mrst!sdti!mjy Internet : mjy%sdti.uucp@harvard.harvard.edu Tel: +1 617 443 5779
jc@minya.UUCP (John Chambers) (02/04/88)
In article <7209@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes: > In article <246@vcvax1.UUCP> naren@vcvax1.UUCP (naren) writes: > > Now, if you REALLY want to do this, you could write a new system call > >like mktext(vaddr, length)... > > In case it isn't obvious to everyone, the reason why this can be done is that > the operating system kernel has special privileges and can therefore shuffle > around virtual->physical address mappings and associated attributes, but an > ordinary user-mode process cannot do this itself. That's why IPC via shared > memory requires kernel support, for example. Uh, no it doesn't. (Well, maybe it does on a PDP-11 :-). I've personally worked on one system where we had shared memory with no support whatsoever from the kernel. The poor li'l kernel didn't even suspect we were doing it. Guess how we did it? Give up? (OK, turkey, tell 'em!) The processor in question came up with the MM registers mapped in the obvious way, so that real and virtual memory were identical. The kernel was kept in ignorance of the last few MM registers, which by some strange coincidence just happened to point to real memory 'way up in the address space. (Did I say that this was a machine with 32-bit addresses? Well, it was.) This chunk of real memory was quite distant from the main memory, and when the kernel did its scan for usable memory, it didn't find the high chunk. Like most Unix kernels, it only believed in a single contiguous piece of real memory. The effect was to map this small chunk of memory into the virtual address space of every process, without the knowledge of the kernel. Most processes also didn't realize it was there. Our run-time library did, and did some very interesting things with it, and very fast. OK, it's a kludge. But then, memory-mapped anything is exactly the same kind of kludge. Just classify it as a memory-mapped device (as are quite a lot of network interface boards these days), and it all makes sense. For instance, go talk to CMC about their ethernet boards. This example had the advantage that we didn't have to try to figure out how the Sys/V shm package works. I mean, it was pure elegance in comparison with how AT&T would have liked us to do it. (:-) -- John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)