[net.lang] Self-modifying code

barmar@mit-eddie.UUCP (Barry Margolin) (12/29/83)

Most decent programming languages provide no way to create
self-modifying code.  The only high-level language that I know of that
even has this concept is COBOL, and I think that it was removed in the
infamous COBOL-80 standard; also, any particular implementation might
choose not to implement this as self-modifying code.  Only in assembler
can you use self-modifying code and KNOW that you are actually gaining
in performance by it, but who programs in assembler these days?

Hmm, I just thought of a way to do self-modifying code in a very
high-level language: interpreted Lisp.  You can easily find the list
structure that is the definition of the current function and splice into
it.  However, if efficiency is what you are after then you will have to
compile your program, and then it stops working.
-- 
			Barry Margolin
			ARPA: barmar@MIT-Multics
			UUCP: ..!genrad!mit-eddie!barmar

minow@decvax.UUCP (Martin Minow) (12/30/83)

Anyone wishing to pursue this topic should read Eckhouse and Morris's
book, Minicomputer Systems.  Bob Morris has come up with a number
of nifty self-modifying algorithms that do *VERY FAST* digital
signal processing on the PDP-11/70 (FFT and LPC).

He computes -- on the fly -- optimal multiply/shift algorithms
and builds code segments that execute the FFT inner loop.

Using this technique (which he has described in several publications,
especially ICASSP proceedings), the 11/70 can outperform many
dedicated signal processing systems.

He has recently devised similar algorithms for the TI TMS320 signal
processing micro.

Martin Minow
decvax!minow

PS: I think SNOBOL 4 allows self-modifying code.  Many years ago,
I remember seeing a program that read parameters, built a Fortran
(or somesuch) program accordingly, then submitted the fully-configured
program to the batch processor.  Doing something similar on Unix
(build the program, fork a compiler to build an executable image,
then fork the image and "call" it using a pipe) should be trivial.

henry@utzoo.UUCP (Henry Spencer) (12/31/83)

One should distinguish between real self-modifying code and doing
compilation at "run time".  In the latter case, the code is not really
modifying itself:  new code (not modified old code) is being created
and then run.  This is a useful technique that is not used as much as
it should be, although admittedly it's not a trivial thing to do and
it obviously has severe portability problems.

Another example of this sort of thing is that the RasterOp implementation
for the Blit compiles optimal code for each request rather than trying
to build in all the special cases ahead of time.  The compilation takes
something like 600 us worst case, so the overhead involved is small.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

tim@unc.UUCP (12/31/83)

In response to the assertion that high-level languages don't allow
self-modifying code, I must point out that C's pointer typing freedoms allow
the addresses of functions to be converted into pointers to arbitrary data.
The self-modification can be done completely in C, although it will be
processing machine code, not C.  (Needless to say, this requires that the
program text not be loaded as read-only.)

It is also possible to write complete functions in process memory and
transfer control to them, and modify them.
--
Tim Maroney, University of North Carolina at Chapel Hill
duke!unc!tim (USENET), tim.unc@csnet-relay (ARPA)

crane@fortune.UUCP (John Crane) (01/03/84)

I can think of a very good reason for writing self-modifying code or
for a program to generate code which it later executes.

I once wrote a report generator similar to RPG. I wrote it in IBM
assembler because I needed the ability to generate code, mnodify it,
and execute it in real time.  

I look for this ability in a language and consider its presence an
asset and its absence a serious drawback.

I agree that the ALTER statement in COBOL is an abomination as is the
rest of the language as presently used. COBOL could be a useful
business programming language if it had some of the control structures
that C has. 

A lot is made of the ability of C and Ada to have
separately-compiled modules which can be linked together as well as
functions which could be called with a list of parameters. The features
were there in COBOL and PL/I, but programmers  didn't understand their usefulness
and IBM made them so difficult to use that everybody was afraid to try
them.

In my opinion, COBOL's verbosity is not a drawback. It forces
the programmer to document his code. I think every line of a C or
assembler program should be documented even if you as a programmer
think the code is so obvious anybody could understand it. I came to this
conclusion  after years of trying to decipher other people's spaghetti. As a
result, my C programs look a lot like COBOL.

borman@decvax.UUCP (Dave Borman) (01/04/84)

Several years ago (~1979/1980) at St. Olaf College (it's in Minnesota, stolaf
on the net) a plotting package was written in C, to run under V7 on a PDP
11/70. One question was how to allow the user to input functions to be plotted,
and how to evauluate them.  What was decided upon was to write a function
compiler, so that at run time the functions could be broken down into assembly
and then executed.  The machine instructions were put into an integer array,
and then the array was cast into a function pointer and called for each
point at which a value was needed.  The compiler handled all normal arithmetic
operations, plus summations, products, conditionals, constants, and references
to other functions, both user defined and from an internal library. The only
drawback to this scheme is that since the code is being generated in data
space, in order to execute it you can't have compiled the program split I/D.

The author of this function compiler is Steve Tarr.

			-Dave Borman, decvax!borman

preece@uicsl.UUCP (01/05/84)

#R:mit-eddi:-109600:uicsl:6200003:000:761
uicsl!preece    Jan  4 12:24:00 1984

----------
	Most decent programming languages provide no way to create
	self-modifying code. ...
	...	Only in assembler
	can you use self-modifying code and KNOW that you are actually gaining
	in performance by it, but who programs in assembler these days?
----------
I think your impression of assembler's death is exaggerated. Even if that
were not the case, however, most of us write code that compilers turn
into assembler; there's no reason the compiler couldn't generate code
using such tricks (there are instances when a self-modifying program is
a very natural expression of what the program is doing, the example that
comes to mind being the simple gate that closes itself after the first
time through a loop).

scott preece
ihnp4!uiucdcs!uicsl!preece

ss@wivax.UUCP (Sid Shapiro) (01/05/84)

I can think of yet-another-good-reason for self-modifying code.
Execution profilers:

Without having access to the compiler code, I had to write an execution
profiler.  To do this I had the user insert one init-type routine as
the first executable statement.  The initializer then skipped through
memory saving and replacing all of the subroutine calls, function
calls, etc. with calls to a counter/timer  init routines.  The returns
returned to the counter and the final user exit really returned to the
report writer.  (It was fun to write!)

True, it was done in assembly language, but without access to the
compiler, or writing a new simulator, how would YOU create a
profiler?
-- 

Sid Shapiro -- Wang Institute of Graduate Studies
    [cadmus, decvax, linus, apollo, bbncca, sco]!wivax!ss
    ss.Wang-Inst@Csnet-Relay 
	  (617)649-9731

mckeeman@wivax.UUCP (01/06/84)

Let us not forget Lisp and MockLisp.  The ability to
execute programs created on the fly is an essential power
of these languages.

smh@mit-eddie.UUCP (Steven M. Haflich) (01/06/84)

The most extreme case of self modifying-code in my experience was
a large system which modified *every* instruction in a large executable
module.

While working on the FORMAC project at IBM, I evaluated a nice program
verification system for 360 Assembly Language systems.  The verifier
would find instructions which were never executed by a suite of test
programs.  It started with a copy of the executable code module and
copies of the assembly listings for all its source components.  The
executable was massaged to replace all executable instructions with trap
instructions, then execution of the test suite was begun.  Each time a
trap was hit, the verifier would replace the trap with the original
instruction, mark the source line in the assembly listing, then restart
execution.  When the test suite was finished, it would locate and/or
count unmarked lines in the assembly listing.  The reason for using code
modification instead of attempting simple interpretive execution was so
the test suite would run with reasonable speed:  The execution time with
the verifier would be the time taken by the suite without the verifier,
plus a fixed amount of time per source line, not per instruction
executed.

Before anyone asks, I no longer remember the name of this code verifier,
nor know whether it was any kind of available product.  However, I sure
wish something like it existed for the C compiler/assembler today!

In addition to profilers, self modifying code is still used in some
implementations of dynamic subroutine linkage.

andrew@inmet.UUCP (01/07/84)

#R:fortune:-215700:inmet:4700014:000:365
inmet!andrew    Jan  5 12:10:00 1984

Very interesting!  I had proposed doing a similar thing many years ago
('72 or so) and everybody thought I was crazy!  I ended up doing it,
sort of, by using a crude interpreter instead - creating a parse tree
and walking it to evaluate the function again each time an argument
changed value.
 
Andrew W. Rogers, Intermetrics    ...{harpo|ima|esquire}!inmet!andrew

spaf@gatech.UUCP (Gene Spafford) (01/08/84)

This concerns the testing method whereby the code is modified with trap
instructions.

Richard DeMillo (a faculty member here at Georgia Tech) and some of his
students did some work in the field of "program mutation" a while
back.  This scheme is akin to code modification with traps.  Basically,
what program mutatiion does is "mutate" the source code and then run a
test suite on the mutations.  "Viable" mutations are then saved for
further analysis.  Mutations are things like changing loop boundaries,
re-ordering certain statements, and putting halt statements in sections
of code that might never be reached.

When a program had undergone a suite of mutation tests, the resulting
program which behaved correctly could either be further tested, or
compared by a human being.  Some mutations might be found to be
"benign" and of no further interest, but some mutations might point out
flaws in the original coding or sections of code which were never
executed.

As I understand it, this idea was actually applied to large sets of
Cobol and Fortran programs, and may still be in use for some
applications (I wasn't involved with the project in any way, so I don't
know).  Since it goes beyond simply replacing each instruction with a
"trap," I thought it might be of interest.  The bibliographies to the
tech reports might also be of interest.  If anybody is interested in
more information, I suggest you write to:
	Richard DeMillo
	School of Information and Computer Science
	Georgia Institute of Technology
	Atlanta, GA 30332
and ask about his program mutation project.
-- 
Off the Wall of Gene Spafford
The Clouds Project, School of ICS, Georgia Tech, Atlanta GA 30332
CSNet:	Spaf @ GATech		ARPA:	Spaf.GATech @ CSNet-Relay
uucp:	...!{akgua,allegra,rlgvax,sb1,unmvax,ulysses,ut-sally}!gatech!spaf

leichter@yale-com.UUCP (Jerry Leichter) (01/09/84)

Two quick comments:

SNOBOL does, indeed, allow self-modifying code in several ways of varying power:

	1.  Functions are defined dynamically, as in LISP.  You can re-define
	    any function at any time, effectively changing the meaning of all
	    calls.
	2.  The compiler is callable as a subroutine.  You can take a string,
	    convert it to an expression, and evaluate the expression.  Alterna-
	    tively, you can take a string that contains multiple statements,
	    compile them into code, and transfer to the first instruction of
	    the code.  Note that both "unevaluated expressions" and "code"
	    are data objects that can be passed around in SNOBOL code.
	3.  In case (2), if the code being compiled contains any labeled
	    statements, the labels get defined, and can be transfered to.  If
	    a label is found that is already defined, the new value replaces
	    the old.  As a result, you can "overlay" code within the program,
	    IF it is explicitly transfered to.  (If control just flows into
	    the code, it will continue to do so even if the label that used to
	    point to the code no longer does so.)

This is not a completely general facility, but it is quite powerful.  It is
not very widely used.  Among the more interesting uses is to have rarely-used
functions be read in "on demand" - you initially define the function to point
to a stub that reads in the code, compiles it, re-defines the function, and
transfers to it.  I used these facilities to build an interactive SNOBOL called
APLBOL; it provides a user interface almost the same as APL's - you define and
edit functions "on the fly".  APLBOL passes them off to SNOBOL for compilation.
You can do all sorts of unexpected things in SNOBOL if you understand the lang-
uage well enough.
							-- Jerry
					decvax!yale-comix!leichter leichter@yale

leichter@yale-com.UUCP (Jerry Leichter) (01/09/84)

Another comment:  I don't believe the C spec provides any definition for what
a pointer to a function IS; it just says you can use it to CALL the function.
It need NOT point to the function's code.  In fact, the C compiler that Apollo
sells has function pointers pointing to "function blocks" that indirectly
point to the function - needed because of cross-mapping-segment calls, or
something like that.
							-- Jerry

leichter@yale-com.UUCP (Jerry Leichter) (01/09/84)

BTW, the most widespread occurences of self-modifying code I know of are in
that little-appreciated language, TECO.  It is quite common to build code on
the fly (need much less in more recent dialects, which allow q-register
references inside such things as file-name specifications - but there are
still many examples).  It is also sometimes useful to modify code.  I've
used things like a pre-processor that, depending on whether you wanted logging
from the main code or not, took the main code and left it alone or actually
physically removed the logging code.  This was worthwhile because TECO is
interpreted and not very fast, so you didn't want the overhead - which was in
an inner loop - in "production" code.  Rarely needed, but handy...
								-- Jerry

leichter@yale-com.UUCP (Jerry Leichter) (01/09/84)

On modifying programs for speed:  On the CDC 6000 (now CYBER 7x) machines, a
call to a function, as generated by the FORTRAN compiler, always took up a
full word.  (The 6000 series has 60-bit words, and 15 and 30-bit instructions.
A function call is 30 bits, and a return always goes to the NEXT word, so you
can't put any code after it.  The typical convention is to always put the
call in the top 30 bits, and put something like traceback info in the bottom
30 bits.)  Now, a full word can hold 4 instructions, and the call/return are
relatively expensive (ANY control transfer costs a lot - on the 6600, a 48-bit
(mantissa) floating multiply took 800 nanoseconds, while branch took at least
1200 nanoseconds, and could take more).  There thus arose the idea of short
functions that OVERWROTE CALLS TO THEMSELVES.  A typical example was a pair of
functions, callable from FORTRAN, to do shifting.  When first called, they
replaced the function call with a shift instruction.  This was eventually gene-
ralized to a function that received as an argument the code it was to overlay
its calling instruction by.  This is the Unix "modify the assembler's output
to insert machine-specific code" done at run-time.

Another example:  The original SPITBOL on the IBM 360 was a true compiler,
generating executable code.  It implemented SNOBOL's tracing facilities by
modifying the code to insert trace calls - i.e. it acted much like many
debuggers today.
							-- Jerry
					decvax!yale-comix!leichter leichter@yale

rpw3@fortune.UUCP (01/09/84)

#R:fortune:-215700:fortune:15100006:000:1826
fortune!rpw3    Jan  9 02:49:00 1984

Another use of self-modifying code in TECO (the editor) was to solve
the conflict between speed and documentation. TECO did not have comments,
but it did have pretty nearly arbitrary tags as targets for goto's (anything
between exclamation points, like !This is a tag, including the spaces!).
So people who liked to use TECO's text-processing ability but were a bit
concerned about leaving cryptic hieroglyphs lying around (TECO has been
called the APL of editors) started using the practice of documenting
big TECO macros (programs) with otherwise unused tags (which can include
carriage returns, even):

	!begin!
		j	!make sure all is well by jumping to the top
	!	3d	!delete the leftovers from last pass
	!

and so on. Now the only problem with that is that TECO is a character-at-
a-time interpreter, and all those tags (comments) had to be parsed and
so the macro (program) ran SLOOOOOOWW! So before long every macro started
out with a little piece of code (editing commands) which saved what was in
the current buffer, got the macro, edited out all the tags with a <CR> in
them (leaving "real" tags), put the sped up macro in a q-register, restored
the buffer and jumped to the stripped version (whew!). In some cases it
meant a factor of 10 (!) in performance. Thus it was that whole subsystems
(mailing lists, label printers, order entry) came to be written and
maintained in TECO. Not only that, but since the clerical people used TECO
a little for their general WP jobs, they even started maintaining the macros
themselves. There even came to be a full-screen editor written in TECO!
(Was called either TED or KED, I forget.) Ah, the old days...

Rob Warnock

UUCP:	{sri-unix,amd70,hpda,harpo,ihnp4,allegra}!fortune!rpw3
DDD:	(415)595-8444
USPS:	Fortune Systems Corp, 101 Twin Dolphins Drive, Redwood City, CA 94065

barmar@mit-eddie.UUCP (Barry Margolin) (01/10/84)

---------------
From: rpw3@fortune.UUCP
There even came to be a full-screen editor written in TECO!
(Was called either TED or KED, I forget.) Ah, the old days...
---------------

How soon they forget.  It is called EMACS, the original implementation.
It was written at MIT using MIT's PDP-10 TECO (which was much extended
over DEC's version).
-- 
			Barry Margolin
			ARPA: barmar@MIT-Multics
			UUCP: ..!genrad!mit-eddie!barmar

debray@sbcs.UUCP (Saumya Debray) (01/10/84)

:
Prolog permits users to write self-modifying code pretty easily : in this
language, program and data appear identical, so an assertion into the
program database, or a retraction from it, effectively results in a change
to the program. In fact, this feature is one of the major problems with the
semantics of the language ("pure" Prolog has a very elegant semantics in
terms of Horn Clause logic).

In passing: Prolog has no concept of "global" variables.  Therefore,
attempts at simulating global variables, e.g.  for gensym counters, usually
use the self-modifying feature of Prolog (it could be done by passing the
value of the gensym counter around explicitly as a parameter to everyone who
needed it, but that would be a royal pain in the neck!), though the practice
is frowned upon by Prolog purists.
-- 

Saumya Debray
SUNY at Stony Brook

		{floyd, bunker, cbosgd, mcvax, cmcl2}!philabs!
							      \
	Usenet:						       sbcs!debray
							      /
		   {allegra, teklabs, hp-pcd, metheus}!ogcvax!

	CSNet: debray@suny-sbcs@CSNet-Relay

preece@uicsl.UUCP (01/11/84)

#R:fortune:-215700:uicsl:6200005:000:559
uicsl!preece    Jan 10 12:03:00 1984

The SAIL execution profiler used a set of macros to replace the begin and
end 'statements' with calls to the timer/counter routine.  It was a
pain to use, though, since all the begins and ends had to be changed
to begintims and endtims and the routine names inserted in them so that
the profiler could give names to them.

On the other hand, it involved no manipulation of the object code and
could be turned off by simply replacing the macro package with another
that simply turned begintim into begin, and so forth.

scott preece
ihnp4!uiucdcs!uicsl!preece

emjej@uokvax.UUCP (01/11/84)

#R:fortune:-215700:uokvax:9000015:000:213
uokvax!emjej    Jan  9 10:25:00 1984

Not to foster the use of self-modifying code, but a technique I've
seen used under OS-9, in which you can't put self-modifying code
in an module, is to push code onto the stack and then call it.

					James Jones

ucbesvax.turner@ucbcad.UUCP (01/12/84)

#R:fortune:-215700:ucbesvax:4500007:000:1642
ucbesvax!turner    Jan  5 03:13:00 1984

To mention FORTH in this newsgroup is perhaps only to invite flamage
(con-, and maybe pro-), but I think the problem of self-modifying code
for most cases could be solved by providing a support library for building
up segments of threaded code.  One could then dynamically "compile"
(dreaded FORTH re-definition of the word) and execute interpreted code
that was as much as five times slower than *real* compiled code, but
usually around five times faster than most interpreters written in high-
level languages.

The niches are everywhere: just about every termcap-style device-
description format I've seen has a BUGS entry--"Really should be
compiled."  (This is true of both BSD termcap and a local, device-
independent graphics driver that could use it even more.)  The
interactive function-plotter is another good example.  Database-
query compilers yet another.

With a threaded-code interpreter-kernel linked as a subroutine, and
some macros and routines for segment definition, exception-trapping,
etc., all that is left for many applications is an infix-to-postfix
converter.  Recursive descent or lex/yacc can be used.  Then your code
can be ported to any machine with an implementation of the library. 

Finally, since the threaded-code kernel and support routines can be
defined in a largely machine-independent way, porting the library
itself would involve a fairly small, one-time-only effort.

But there's a danger: someone might use it to bring up FORTH on your
machine.  Pretty soon, you'd have weird people all over the place,
late-night drug-deals in the office, the whole bit.
---
Michael Turner (ucbvax!ucbesvax.turner)

brownell@h-aiken.UUCP (Dave Brownell) (01/12/84)

The reason that the "purists" frown upon such self modifying Prolog
code as counters is actually significant.  One of the great hopes for
logic programming is that it be an efficient road into parallel
processing, such as with dataflow computers.  Self modifying code does
not lend itself to parallel execution; training programmers with
techniques that are already on the way out will make those techniques
calcify like COBOL.  Logic programming should force a different mind
set than serial languages; counters and side effects should NOT be
in the domain of discussion.

Actually, you didn't even pick the most common example of SMC in Prolog:
adding facts to the database!!  Here's an example of a case where to
get useful programs written, you NEED to write "self-modifying" code.
Does any other language make you do this?

	Dave Brownell
	Sequoia Systems (posting from Harvard University, Aiken Lab)
	...wjh12!h-aiken!brownell

andree@uokvax.UUCP (01/13/84)

#R:fortune:-215700:uokvax:9000018:000:953
uokvax!andree    Jan 11 20:58:00 1984

Ok, enough. I've got to flame (mildly).

First, under normal circumstances, there is NO good reason to write
self-modifing code. Most of the things posted here could be done
by on-the-fly code creation. This is NOT the same thing, and can
actually be done cleanly (even in assembler!)

The trouble is, very few languages will let you create and run
code on the fly. You practically HAVE to have an intepretive system,
and not all of those give you this facility. The best langauge
at this, is, of course LISP. FORTH is nearly as good. SNOBOL doesn't
do to badly either. I can't think of any others off the top of my
head. Are there others?

Final note: `normal circumstances' means that you have the source,
and aren't worried about getting that last microsecond/byte out of
the system. There are cases where you need to do this, and such
things are reasonable to do to meet spec (providing you have nice,
clean, comment HLL soruce somewhere!).

	<mike

preece@uicsl.UUCP (01/17/84)

#R:fortune:-215700:uicsl:6200007:000:794
uicsl!preece    Jan 16 08:21:00 1984

As with most programming tricks, there is nothing that says that
self-modifying code HAS to be obscure or hard to understand or even
unstructured.  The principaly rule must simply be: You must make
clear to the reader exactly what you are doing.  in the case of
self-modifying code this would mean (1) using an appropriate mechanism
and (2) providing comments specifying exactly when the modification
takes place, what the modification does, and where the code is that
performs the modification.

In assembly language it would also be nice, where speed permits, to
stick an Ascii constant (skipped over in the program flow) at the
appropriate place so that the run-time effect is clearly marked (for
the benefit of those debugging without source code).

scott preece
ihnp4!uiucdcs!uicsl!preece

saj@iuvax.UUCP (02/15/84)

#R:fortune:-215700:iuvax:11800008:000:684
iuvax!apratt    Jan  8 00:38:00 1984

I found another use for self-modifying code: speed.  I wrote a graphics
package (a pretty simple-minded one) for the IBM PC, and found that my
line-drawing routine had two spots where one of three things needed doing:
	increment x (or y),
	decrement x (or y),
	do nothing to x (or y).

	I could have loaded the proper add-on value (-1, 0, or 1) into a
memory location or register and used an add instruction, but it was MUCH
faster to have the program "poke" an increment, decrement, or NOP instruction
into the appropriate spot. One (inprecise) comparison (timed with a watch)
showed a 20% speed increase (approx. 4 seconds in 20).
							-- Allan Pratt
						...ihnp4!iuvax!apratt

geoff@proper.UUCP (Geoff Kuenning) (02/23/84)

>    I found another use for self-modifying code: speed.  I wrote a graphics
>    package (a pretty simple-minded one) for the IBM PC, and found that my
>    line-drawing routine had two spots where one of three things needed doing:
>	    increment x (or y),
>	    decrement x (or y),
>	    do nothing to x (or y).
>
>	    I could have loaded the proper add-on value (-1, 0, or 1) into a
>    memory location or register and used an add instruction, but it was MUCH
>    faster to have the program "poke" an increment, decrement, or NOP
>    instruction into the appropriate spot. One (inprecise) comparison
>    (timed with a watch) showed a 20% speed increase (approx. 4 seconds in 20).
>							-- Allan Pratt

Oh, yeah?  When I wrote the same routine, I simply used an 8-way "case".  The
test and branch has to be done anyway to do the self-modifying code, and it's
probably cheaper than storing into an instruction anyway.  And the loop in
question is so short that the amount of space it takes is totally irrelevant
in any computer with more than 4K of memory.

			Geoff Kuenning
			Callan Data Systems
                        {decvax,ucbvax}!ihnp4!sdcrdcf!trwrb!wlbr!callan!geoff