barmar@mit-eddie.UUCP (Barry Margolin) (12/29/83)
Most decent programming languages provide no way to create self-modifying code. The only high-level language that I know of that even has this concept is COBOL, and I think that it was removed in the infamous COBOL-80 standard; also, any particular implementation might choose not to implement this as self-modifying code. Only in assembler can you use self-modifying code and KNOW that you are actually gaining in performance by it, but who programs in assembler these days? Hmm, I just thought of a way to do self-modifying code in a very high-level language: interpreted Lisp. You can easily find the list structure that is the definition of the current function and splice into it. However, if efficiency is what you are after then you will have to compile your program, and then it stops working. -- Barry Margolin ARPA: barmar@MIT-Multics UUCP: ..!genrad!mit-eddie!barmar
minow@decvax.UUCP (Martin Minow) (12/30/83)
Anyone wishing to pursue this topic should read Eckhouse and Morris's book, Minicomputer Systems. Bob Morris has come up with a number of nifty self-modifying algorithms that do *VERY FAST* digital signal processing on the PDP-11/70 (FFT and LPC). He computes -- on the fly -- optimal multiply/shift algorithms and builds code segments that execute the FFT inner loop. Using this technique (which he has described in several publications, especially ICASSP proceedings), the 11/70 can outperform many dedicated signal processing systems. He has recently devised similar algorithms for the TI TMS320 signal processing micro. Martin Minow decvax!minow PS: I think SNOBOL 4 allows self-modifying code. Many years ago, I remember seeing a program that read parameters, built a Fortran (or somesuch) program accordingly, then submitted the fully-configured program to the batch processor. Doing something similar on Unix (build the program, fork a compiler to build an executable image, then fork the image and "call" it using a pipe) should be trivial.
henry@utzoo.UUCP (Henry Spencer) (12/31/83)
One should distinguish between real self-modifying code and doing compilation at "run time". In the latter case, the code is not really modifying itself: new code (not modified old code) is being created and then run. This is a useful technique that is not used as much as it should be, although admittedly it's not a trivial thing to do and it obviously has severe portability problems. Another example of this sort of thing is that the RasterOp implementation for the Blit compiles optimal code for each request rather than trying to build in all the special cases ahead of time. The compilation takes something like 600 us worst case, so the overhead involved is small. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
tim@unc.UUCP (12/31/83)
In response to the assertion that high-level languages don't allow self-modifying code, I must point out that C's pointer typing freedoms allow the addresses of functions to be converted into pointers to arbitrary data. The self-modification can be done completely in C, although it will be processing machine code, not C. (Needless to say, this requires that the program text not be loaded as read-only.) It is also possible to write complete functions in process memory and transfer control to them, and modify them. -- Tim Maroney, University of North Carolina at Chapel Hill duke!unc!tim (USENET), tim.unc@csnet-relay (ARPA)
crane@fortune.UUCP (John Crane) (01/03/84)
I can think of a very good reason for writing self-modifying code or for a program to generate code which it later executes. I once wrote a report generator similar to RPG. I wrote it in IBM assembler because I needed the ability to generate code, mnodify it, and execute it in real time. I look for this ability in a language and consider its presence an asset and its absence a serious drawback. I agree that the ALTER statement in COBOL is an abomination as is the rest of the language as presently used. COBOL could be a useful business programming language if it had some of the control structures that C has. A lot is made of the ability of C and Ada to have separately-compiled modules which can be linked together as well as functions which could be called with a list of parameters. The features were there in COBOL and PL/I, but programmers didn't understand their usefulness and IBM made them so difficult to use that everybody was afraid to try them. In my opinion, COBOL's verbosity is not a drawback. It forces the programmer to document his code. I think every line of a C or assembler program should be documented even if you as a programmer think the code is so obvious anybody could understand it. I came to this conclusion after years of trying to decipher other people's spaghetti. As a result, my C programs look a lot like COBOL.
borman@decvax.UUCP (Dave Borman) (01/04/84)
Several years ago (~1979/1980) at St. Olaf College (it's in Minnesota, stolaf on the net) a plotting package was written in C, to run under V7 on a PDP 11/70. One question was how to allow the user to input functions to be plotted, and how to evauluate them. What was decided upon was to write a function compiler, so that at run time the functions could be broken down into assembly and then executed. The machine instructions were put into an integer array, and then the array was cast into a function pointer and called for each point at which a value was needed. The compiler handled all normal arithmetic operations, plus summations, products, conditionals, constants, and references to other functions, both user defined and from an internal library. The only drawback to this scheme is that since the code is being generated in data space, in order to execute it you can't have compiled the program split I/D. The author of this function compiler is Steve Tarr. -Dave Borman, decvax!borman
preece@uicsl.UUCP (01/05/84)
#R:mit-eddi:-109600:uicsl:6200003:000:761 uicsl!preece Jan 4 12:24:00 1984 ---------- Most decent programming languages provide no way to create self-modifying code. ... ... Only in assembler can you use self-modifying code and KNOW that you are actually gaining in performance by it, but who programs in assembler these days? ---------- I think your impression of assembler's death is exaggerated. Even if that were not the case, however, most of us write code that compilers turn into assembler; there's no reason the compiler couldn't generate code using such tricks (there are instances when a self-modifying program is a very natural expression of what the program is doing, the example that comes to mind being the simple gate that closes itself after the first time through a loop). scott preece ihnp4!uiucdcs!uicsl!preece
ss@wivax.UUCP (Sid Shapiro) (01/05/84)
I can think of yet-another-good-reason for self-modifying code. Execution profilers: Without having access to the compiler code, I had to write an execution profiler. To do this I had the user insert one init-type routine as the first executable statement. The initializer then skipped through memory saving and replacing all of the subroutine calls, function calls, etc. with calls to a counter/timer init routines. The returns returned to the counter and the final user exit really returned to the report writer. (It was fun to write!) True, it was done in assembly language, but without access to the compiler, or writing a new simulator, how would YOU create a profiler? -- Sid Shapiro -- Wang Institute of Graduate Studies [cadmus, decvax, linus, apollo, bbncca, sco]!wivax!ss ss.Wang-Inst@Csnet-Relay (617)649-9731
mckeeman@wivax.UUCP (01/06/84)
Let us not forget Lisp and MockLisp. The ability to execute programs created on the fly is an essential power of these languages.
smh@mit-eddie.UUCP (Steven M. Haflich) (01/06/84)
The most extreme case of self modifying-code in my experience was a large system which modified *every* instruction in a large executable module. While working on the FORMAC project at IBM, I evaluated a nice program verification system for 360 Assembly Language systems. The verifier would find instructions which were never executed by a suite of test programs. It started with a copy of the executable code module and copies of the assembly listings for all its source components. The executable was massaged to replace all executable instructions with trap instructions, then execution of the test suite was begun. Each time a trap was hit, the verifier would replace the trap with the original instruction, mark the source line in the assembly listing, then restart execution. When the test suite was finished, it would locate and/or count unmarked lines in the assembly listing. The reason for using code modification instead of attempting simple interpretive execution was so the test suite would run with reasonable speed: The execution time with the verifier would be the time taken by the suite without the verifier, plus a fixed amount of time per source line, not per instruction executed. Before anyone asks, I no longer remember the name of this code verifier, nor know whether it was any kind of available product. However, I sure wish something like it existed for the C compiler/assembler today! In addition to profilers, self modifying code is still used in some implementations of dynamic subroutine linkage.
andrew@inmet.UUCP (01/07/84)
#R:fortune:-215700:inmet:4700014:000:365 inmet!andrew Jan 5 12:10:00 1984 Very interesting! I had proposed doing a similar thing many years ago ('72 or so) and everybody thought I was crazy! I ended up doing it, sort of, by using a crude interpreter instead - creating a parse tree and walking it to evaluate the function again each time an argument changed value. Andrew W. Rogers, Intermetrics ...{harpo|ima|esquire}!inmet!andrew
spaf@gatech.UUCP (Gene Spafford) (01/08/84)
This concerns the testing method whereby the code is modified with trap instructions. Richard DeMillo (a faculty member here at Georgia Tech) and some of his students did some work in the field of "program mutation" a while back. This scheme is akin to code modification with traps. Basically, what program mutatiion does is "mutate" the source code and then run a test suite on the mutations. "Viable" mutations are then saved for further analysis. Mutations are things like changing loop boundaries, re-ordering certain statements, and putting halt statements in sections of code that might never be reached. When a program had undergone a suite of mutation tests, the resulting program which behaved correctly could either be further tested, or compared by a human being. Some mutations might be found to be "benign" and of no further interest, but some mutations might point out flaws in the original coding or sections of code which were never executed. As I understand it, this idea was actually applied to large sets of Cobol and Fortran programs, and may still be in use for some applications (I wasn't involved with the project in any way, so I don't know). Since it goes beyond simply replacing each instruction with a "trap," I thought it might be of interest. The bibliographies to the tech reports might also be of interest. If anybody is interested in more information, I suggest you write to: Richard DeMillo School of Information and Computer Science Georgia Institute of Technology Atlanta, GA 30332 and ask about his program mutation project. -- Off the Wall of Gene Spafford The Clouds Project, School of ICS, Georgia Tech, Atlanta GA 30332 CSNet: Spaf @ GATech ARPA: Spaf.GATech @ CSNet-Relay uucp: ...!{akgua,allegra,rlgvax,sb1,unmvax,ulysses,ut-sally}!gatech!spaf
leichter@yale-com.UUCP (Jerry Leichter) (01/09/84)
Two quick comments: SNOBOL does, indeed, allow self-modifying code in several ways of varying power: 1. Functions are defined dynamically, as in LISP. You can re-define any function at any time, effectively changing the meaning of all calls. 2. The compiler is callable as a subroutine. You can take a string, convert it to an expression, and evaluate the expression. Alterna- tively, you can take a string that contains multiple statements, compile them into code, and transfer to the first instruction of the code. Note that both "unevaluated expressions" and "code" are data objects that can be passed around in SNOBOL code. 3. In case (2), if the code being compiled contains any labeled statements, the labels get defined, and can be transfered to. If a label is found that is already defined, the new value replaces the old. As a result, you can "overlay" code within the program, IF it is explicitly transfered to. (If control just flows into the code, it will continue to do so even if the label that used to point to the code no longer does so.) This is not a completely general facility, but it is quite powerful. It is not very widely used. Among the more interesting uses is to have rarely-used functions be read in "on demand" - you initially define the function to point to a stub that reads in the code, compiles it, re-defines the function, and transfers to it. I used these facilities to build an interactive SNOBOL called APLBOL; it provides a user interface almost the same as APL's - you define and edit functions "on the fly". APLBOL passes them off to SNOBOL for compilation. You can do all sorts of unexpected things in SNOBOL if you understand the lang- uage well enough. -- Jerry decvax!yale-comix!leichter leichter@yale
leichter@yale-com.UUCP (Jerry Leichter) (01/09/84)
Another comment: I don't believe the C spec provides any definition for what a pointer to a function IS; it just says you can use it to CALL the function. It need NOT point to the function's code. In fact, the C compiler that Apollo sells has function pointers pointing to "function blocks" that indirectly point to the function - needed because of cross-mapping-segment calls, or something like that. -- Jerry
leichter@yale-com.UUCP (Jerry Leichter) (01/09/84)
BTW, the most widespread occurences of self-modifying code I know of are in that little-appreciated language, TECO. It is quite common to build code on the fly (need much less in more recent dialects, which allow q-register references inside such things as file-name specifications - but there are still many examples). It is also sometimes useful to modify code. I've used things like a pre-processor that, depending on whether you wanted logging from the main code or not, took the main code and left it alone or actually physically removed the logging code. This was worthwhile because TECO is interpreted and not very fast, so you didn't want the overhead - which was in an inner loop - in "production" code. Rarely needed, but handy... -- Jerry
leichter@yale-com.UUCP (Jerry Leichter) (01/09/84)
On modifying programs for speed: On the CDC 6000 (now CYBER 7x) machines, a call to a function, as generated by the FORTRAN compiler, always took up a full word. (The 6000 series has 60-bit words, and 15 and 30-bit instructions. A function call is 30 bits, and a return always goes to the NEXT word, so you can't put any code after it. The typical convention is to always put the call in the top 30 bits, and put something like traceback info in the bottom 30 bits.) Now, a full word can hold 4 instructions, and the call/return are relatively expensive (ANY control transfer costs a lot - on the 6600, a 48-bit (mantissa) floating multiply took 800 nanoseconds, while branch took at least 1200 nanoseconds, and could take more). There thus arose the idea of short functions that OVERWROTE CALLS TO THEMSELVES. A typical example was a pair of functions, callable from FORTRAN, to do shifting. When first called, they replaced the function call with a shift instruction. This was eventually gene- ralized to a function that received as an argument the code it was to overlay its calling instruction by. This is the Unix "modify the assembler's output to insert machine-specific code" done at run-time. Another example: The original SPITBOL on the IBM 360 was a true compiler, generating executable code. It implemented SNOBOL's tracing facilities by modifying the code to insert trace calls - i.e. it acted much like many debuggers today. -- Jerry decvax!yale-comix!leichter leichter@yale
rpw3@fortune.UUCP (01/09/84)
#R:fortune:-215700:fortune:15100006:000:1826 fortune!rpw3 Jan 9 02:49:00 1984 Another use of self-modifying code in TECO (the editor) was to solve the conflict between speed and documentation. TECO did not have comments, but it did have pretty nearly arbitrary tags as targets for goto's (anything between exclamation points, like !This is a tag, including the spaces!). So people who liked to use TECO's text-processing ability but were a bit concerned about leaving cryptic hieroglyphs lying around (TECO has been called the APL of editors) started using the practice of documenting big TECO macros (programs) with otherwise unused tags (which can include carriage returns, even): !begin! j !make sure all is well by jumping to the top ! 3d !delete the leftovers from last pass ! and so on. Now the only problem with that is that TECO is a character-at- a-time interpreter, and all those tags (comments) had to be parsed and so the macro (program) ran SLOOOOOOWW! So before long every macro started out with a little piece of code (editing commands) which saved what was in the current buffer, got the macro, edited out all the tags with a <CR> in them (leaving "real" tags), put the sped up macro in a q-register, restored the buffer and jumped to the stripped version (whew!). In some cases it meant a factor of 10 (!) in performance. Thus it was that whole subsystems (mailing lists, label printers, order entry) came to be written and maintained in TECO. Not only that, but since the clerical people used TECO a little for their general WP jobs, they even started maintaining the macros themselves. There even came to be a full-screen editor written in TECO! (Was called either TED or KED, I forget.) Ah, the old days... Rob Warnock UUCP: {sri-unix,amd70,hpda,harpo,ihnp4,allegra}!fortune!rpw3 DDD: (415)595-8444 USPS: Fortune Systems Corp, 101 Twin Dolphins Drive, Redwood City, CA 94065
barmar@mit-eddie.UUCP (Barry Margolin) (01/10/84)
--------------- From: rpw3@fortune.UUCP There even came to be a full-screen editor written in TECO! (Was called either TED or KED, I forget.) Ah, the old days... --------------- How soon they forget. It is called EMACS, the original implementation. It was written at MIT using MIT's PDP-10 TECO (which was much extended over DEC's version). -- Barry Margolin ARPA: barmar@MIT-Multics UUCP: ..!genrad!mit-eddie!barmar
debray@sbcs.UUCP (Saumya Debray) (01/10/84)
: Prolog permits users to write self-modifying code pretty easily : in this language, program and data appear identical, so an assertion into the program database, or a retraction from it, effectively results in a change to the program. In fact, this feature is one of the major problems with the semantics of the language ("pure" Prolog has a very elegant semantics in terms of Horn Clause logic). In passing: Prolog has no concept of "global" variables. Therefore, attempts at simulating global variables, e.g. for gensym counters, usually use the self-modifying feature of Prolog (it could be done by passing the value of the gensym counter around explicitly as a parameter to everyone who needed it, but that would be a royal pain in the neck!), though the practice is frowned upon by Prolog purists. -- Saumya Debray SUNY at Stony Brook {floyd, bunker, cbosgd, mcvax, cmcl2}!philabs! \ Usenet: sbcs!debray / {allegra, teklabs, hp-pcd, metheus}!ogcvax! CSNet: debray@suny-sbcs@CSNet-Relay
preece@uicsl.UUCP (01/11/84)
#R:fortune:-215700:uicsl:6200005:000:559 uicsl!preece Jan 10 12:03:00 1984 The SAIL execution profiler used a set of macros to replace the begin and end 'statements' with calls to the timer/counter routine. It was a pain to use, though, since all the begins and ends had to be changed to begintims and endtims and the routine names inserted in them so that the profiler could give names to them. On the other hand, it involved no manipulation of the object code and could be turned off by simply replacing the macro package with another that simply turned begintim into begin, and so forth. scott preece ihnp4!uiucdcs!uicsl!preece
emjej@uokvax.UUCP (01/11/84)
#R:fortune:-215700:uokvax:9000015:000:213 uokvax!emjej Jan 9 10:25:00 1984 Not to foster the use of self-modifying code, but a technique I've seen used under OS-9, in which you can't put self-modifying code in an module, is to push code onto the stack and then call it. James Jones
ucbesvax.turner@ucbcad.UUCP (01/12/84)
#R:fortune:-215700:ucbesvax:4500007:000:1642 ucbesvax!turner Jan 5 03:13:00 1984 To mention FORTH in this newsgroup is perhaps only to invite flamage (con-, and maybe pro-), but I think the problem of self-modifying code for most cases could be solved by providing a support library for building up segments of threaded code. One could then dynamically "compile" (dreaded FORTH re-definition of the word) and execute interpreted code that was as much as five times slower than *real* compiled code, but usually around five times faster than most interpreters written in high- level languages. The niches are everywhere: just about every termcap-style device- description format I've seen has a BUGS entry--"Really should be compiled." (This is true of both BSD termcap and a local, device- independent graphics driver that could use it even more.) The interactive function-plotter is another good example. Database- query compilers yet another. With a threaded-code interpreter-kernel linked as a subroutine, and some macros and routines for segment definition, exception-trapping, etc., all that is left for many applications is an infix-to-postfix converter. Recursive descent or lex/yacc can be used. Then your code can be ported to any machine with an implementation of the library. Finally, since the threaded-code kernel and support routines can be defined in a largely machine-independent way, porting the library itself would involve a fairly small, one-time-only effort. But there's a danger: someone might use it to bring up FORTH on your machine. Pretty soon, you'd have weird people all over the place, late-night drug-deals in the office, the whole bit. --- Michael Turner (ucbvax!ucbesvax.turner)
brownell@h-aiken.UUCP (Dave Brownell) (01/12/84)
The reason that the "purists" frown upon such self modifying Prolog code as counters is actually significant. One of the great hopes for logic programming is that it be an efficient road into parallel processing, such as with dataflow computers. Self modifying code does not lend itself to parallel execution; training programmers with techniques that are already on the way out will make those techniques calcify like COBOL. Logic programming should force a different mind set than serial languages; counters and side effects should NOT be in the domain of discussion. Actually, you didn't even pick the most common example of SMC in Prolog: adding facts to the database!! Here's an example of a case where to get useful programs written, you NEED to write "self-modifying" code. Does any other language make you do this? Dave Brownell Sequoia Systems (posting from Harvard University, Aiken Lab) ...wjh12!h-aiken!brownell
andree@uokvax.UUCP (01/13/84)
#R:fortune:-215700:uokvax:9000018:000:953 uokvax!andree Jan 11 20:58:00 1984 Ok, enough. I've got to flame (mildly). First, under normal circumstances, there is NO good reason to write self-modifing code. Most of the things posted here could be done by on-the-fly code creation. This is NOT the same thing, and can actually be done cleanly (even in assembler!) The trouble is, very few languages will let you create and run code on the fly. You practically HAVE to have an intepretive system, and not all of those give you this facility. The best langauge at this, is, of course LISP. FORTH is nearly as good. SNOBOL doesn't do to badly either. I can't think of any others off the top of my head. Are there others? Final note: `normal circumstances' means that you have the source, and aren't worried about getting that last microsecond/byte out of the system. There are cases where you need to do this, and such things are reasonable to do to meet spec (providing you have nice, clean, comment HLL soruce somewhere!). <mike
preece@uicsl.UUCP (01/17/84)
#R:fortune:-215700:uicsl:6200007:000:794 uicsl!preece Jan 16 08:21:00 1984 As with most programming tricks, there is nothing that says that self-modifying code HAS to be obscure or hard to understand or even unstructured. The principaly rule must simply be: You must make clear to the reader exactly what you are doing. in the case of self-modifying code this would mean (1) using an appropriate mechanism and (2) providing comments specifying exactly when the modification takes place, what the modification does, and where the code is that performs the modification. In assembly language it would also be nice, where speed permits, to stick an Ascii constant (skipped over in the program flow) at the appropriate place so that the run-time effect is clearly marked (for the benefit of those debugging without source code). scott preece ihnp4!uiucdcs!uicsl!preece
saj@iuvax.UUCP (02/15/84)
#R:fortune:-215700:iuvax:11800008:000:684 iuvax!apratt Jan 8 00:38:00 1984 I found another use for self-modifying code: speed. I wrote a graphics package (a pretty simple-minded one) for the IBM PC, and found that my line-drawing routine had two spots where one of three things needed doing: increment x (or y), decrement x (or y), do nothing to x (or y). I could have loaded the proper add-on value (-1, 0, or 1) into a memory location or register and used an add instruction, but it was MUCH faster to have the program "poke" an increment, decrement, or NOP instruction into the appropriate spot. One (inprecise) comparison (timed with a watch) showed a 20% speed increase (approx. 4 seconds in 20). -- Allan Pratt ...ihnp4!iuvax!apratt
geoff@proper.UUCP (Geoff Kuenning) (02/23/84)
> I found another use for self-modifying code: speed. I wrote a graphics > package (a pretty simple-minded one) for the IBM PC, and found that my > line-drawing routine had two spots where one of three things needed doing: > increment x (or y), > decrement x (or y), > do nothing to x (or y). > > I could have loaded the proper add-on value (-1, 0, or 1) into a > memory location or register and used an add instruction, but it was MUCH > faster to have the program "poke" an increment, decrement, or NOP > instruction into the appropriate spot. One (inprecise) comparison > (timed with a watch) showed a 20% speed increase (approx. 4 seconds in 20). > -- Allan Pratt Oh, yeah? When I wrote the same routine, I simply used an 8-way "case". The test and branch has to be done anyway to do the self-modifying code, and it's probably cheaper than storing into an instruction anyway. And the loop in question is so short that the amount of space it takes is totally irrelevant in any computer with more than 4K of memory. Geoff Kuenning Callan Data Systems {decvax,ucbvax}!ihnp4!sdcrdcf!trwrb!wlbr!callan!geoff