[comp.sys.atari.st] Self-Modifying Code

dmb@wam.UMD.EDU (David M. Baggett) (10/27/89)

Richard Covert writes:

>I have a friend who has written some shareware programs
>(ARCIT, ARCIT Shell, UNARCIT etc ) and he would like to
>make his code self-modifying. IN particular he would like
>to save the state of various buttons that he uses.

Just a minor point:

This isn't technically what "self-modifying code" means.  The term
usually refers to executable code that changes/creates executable code
at run-time.  An example might be a bit of code that writes a certain
number of move instructions into a buffer, then jumps into the buffer.

While self-modifying code can occasionally be useful (perhaps for loop
unrolling), it has two major problems:

   1) It's incredibly difficult to understand, modify, debug, and maintain.
   2) It won't work on machines with instruction caches.

Way back when, self-modifying code was thought to be a really powerful 
technique.  I seem to recall there was quite a debate about its merits,
out of which came the "Harvard architecture", where code and data are
kept in separate places (and never the twain shall meet).

Anyway, I think these days that most computer scientists would agree
that it is simply "tricky" and confusing, and should be avoided.

[Richard's stated options:]

>1) Write the data to an external file (*.CFG), but now
>you have to maintain a PRG, a RSC and a CFG file. Clumsy.

I don't see why this is clumsy.  It makes the program much easier to 
maintain.  It would be even nicer if the config file were human-readable
and editable.  C provides fprintf and fscanf for this sort of thing.
If you use these, every C programmer will know what you're doing right
away.  

>2) Write to the RSC file, but my friend wants to incorporate
>the RSC into the PRG (using RSCTOC or equivalent). So, there
>may not be a RSC to write to.
>
>3) Write to the PRG file. Best of all since you need it to
>run his program!!

No offense, but I really think this is a "sleazy hack".  What's the point of 
doing this?  Then you have to worry about what happens when you 
(heaven forbid) modify the source code and recompile, thereby changing 
the size of the executable.  And, as you pointed out, you have to come
up with some "clever" scheme to make sure you don't clobber something
useful (like executable code) in your binary.  Whatever you come up
with will likely be non-portable as well.

If you don't want people to see the config information, why not put it
in a hidden file?

>[1st suggestion --- store a unique string in the binary and search for it]

Not only is this risky, but it's also going to be slow.  Additionally,
putting things like

	static char	config[1024] = "MAGIC STRING HERE";

in your source seems like one of the easiest ways to bewilder anybody 
looking at the code (including you).

>[2nd suggestion --- reserve space at the end of the binary and lseek]

This isn't risky, but it's still slow.  Here you're going to have to
plow through the whole binary to get to your config data.  Is this
an improvement over simply opening a separate file?

>I am interested in this whole idea.

It's a nifty concept, but I think in practice you'll find that it isn't
worth the trouble it will cause you.

	David Baggett
arpa:	dmb@tis.com

steve@thelake.UUCP (Steve Yelvington) (10/27/89)

In article <8910270333.AA14741@cscwam.UMD.EDU>,
     dmb@wam.UMD.EDU (David M. Baggett) writes ... 

>Richard Covert writes:

<bunch of stuff about configuration files and self-modifying code deleted>

>>3) Write to the PRG file. Best of all since you need it to
>>run his program!!
>
>No offense, but I really think this is a "sleazy hack".  What's the point of 
>doing this?  Then you have to worry about what happens when you 
>(heaven forbid) modify the source code and recompile, thereby changing 
>the size of the executable.  And, as you pointed out, you have to come
>up with some "clever" scheme to make sure you don't clobber something
>useful (like executable code) in your binary.  Whatever you come up
>with will likely be non-portable as well.

I use MicroEMACS 2.19, a small, fast text editor. I wanted to save the
margin settings and a few other characteristics, but having to load a
configuration file (a) slows down program invocation, and (b) provides yet
another opportunity for something to go wrong, i.e., the configuration
file gets lost. 

I remembered an old CP/M communications program called MEX that had the
ability to "clone" itself -- to write a modified version of the running
program back to disk. 

So I whined at Dale Schumacher, who was handling the MicroEMACS
modifications, until I got him to add such a feature. I assume that it
indeed is nonportable, but is set off by #ifdef ATARI_ST in the source
code. I don't know any details about the technique. Perhaps Dale can be
persuaded to describe it.

      Steve Yelvington, up at the lake in Minnesota        
  ... pwcs.StPaul.GOV!stag!thelake!steve             (Usenet)   
  ... {playgrnd,moundst,class68}!thelake!steve       (Citadel)

ericco@stew.ssl.berkeley.edu (Eric C. Olson) (10/28/89)

Really Self-Modifying Data

Lisp systems typically have a 'dumplisp' function which dump an image
of itself to disk.  Thus, invoking the dumped lisp, returns you to the
exact environment you dumped.  This is easier to do in lisp since it
treats its source code as data.

Although modify the executable file seems bad to me, I think that modifying
the resource file is completely reasonable solution (and simple).  By
parsing the structure of the resource file, your program can quickly
determine which parameter (object) needs to be modified.  In fact, if
your program uses text, then you should put the text in the resource file
as well.  This is how the resource file is supposed to be used.  By putting
the text in the resource file, non-English speaking people can replace it
with meaningful non-English text.

Eric
ericco@sag4.ssl.berkeley.edu


Eric
ericco@ssl.berkeley.edu

7103_2622@uwovax.uwo.ca (Eric Smith) (11/14/89)

In article <89316.201227SML108@PSUVM.BITNET>, SML108@PSUVM.BITNET writes:
> Hi, I am writing an assembly language routine which modifies its own code in
> a tight loop in order to avoid having to do a decision statement at every
> iteration.  Unfortunately, whatever code I am inserting is screwing things up
> royally, and although I have checked it fairly throughly, I cannot figure out
> what is going on.  Question:  Is there something screwy about executable and
> object files that would disallow self modifying.  The block that gets modified
> 
> is this:
> 
>          lsr.w   d3
>          bne     cont
>          add.l   #8,a0
>          move.w  #$8000,d3
>   cont:  nop
It would have helped if you had included the code that was doing the
modifications. The 68000 does instruction prefetch. If you're modifying
code that's really close to the instruction that does the modification,
then you can lose (the chip is executing the instruction it prefetched,
rather than the updated instruction in memory). You can get around this
by sticking some nops in. A better solution is to eliminate the self
modification entirely. I *strongly* suggest the latter, because your code will
almost certainly break on the TT (which has a 256 byte instruction cache).
--
Eric R. Smith                     email:
Dept. of Mathematics            ERSMITH@uwovax.uwo.ca
University of Western Ontario   ERSMITH@uwovax.bitnet
London, Ont. Canada N6A 5B7
ph: (519) 661-3638

apratt@atari.UUCP (Allan Pratt) (11/15/89)

7103_2622@uwovax.uwo.ca (Eric Smith) writes:
>In article <89316.201227SML108@PSUVM.BITNET>, SML108@PSUVM.BITNET writes:
>> [...things about self-modifying code...]
>It would have helped if you had included the code that was doing the
>modifications. The 68000 does instruction prefetch.

Yeah, and (as Eric points out) on a TT it will get you in BIG TROUBLE.
Writing to something as data and reading it as code is a BIG NO-NO
unless you invalidate the cache in between.  In fact, on the 68030,
writing something in User mode and reading it in Super mode would
confuse the cache, were it not for a side effect of the write-allocate
bit.

People who do DMA into memory have to worry about that - the BIOS
tries to help, but you can still get in trouble.  Your DMA driver
should execute the following instructions to clobber the cache
after a DMA read and before anybody actually looks at the data:

	movec.l	cacr,d0		; get current cache control register value
	or.w	#$808,d0	; set both "clear" bits
	movec.l	d0,cacr		; write this new value back

The clear bits in the cacr are one-shots, so you don't have to clear
them again.  The code above is harmless if the cache isn't enabled in
the first place, as it doesn't change the enable or other state bits.

The TT is going to open a whole new can of worms, people.  We've been
dealing with it internally, of course, and TOS runs fine, but there are
things which you could get away with on the ST which you can't do on
the TT.  For example, some programs use the high byte of a pointer for
something; with a 24-bit address bus, that's harmless.  But with a full
32-bit bus, that gets you in trouble.

============================================
Opinions expressed above do not necessarily	-- Allan Pratt, Atari Corp.
reflect those of Atari Corp. or anyone else.	  ...ames!atari!apratt