[comp.sys.amiga] Whats wrong with self Modifying Code?

kosma%human-torch@stc.lockheed.com (Monty Kosma) (07/13/90)

   From: Glenn Patrick Steffler <gpsteffl@sunee.waterloo.edu>
   Newsgroups: comp.sys.amiga.tech
   Keywords: religion, gurus, whats up?
   Date: 9 Jul 90 16:36:07 GMT
   Organization: Gerbils On Speed Inc.
   Sender: amiga-relay-request@udel.edu

   I have been waiting for people to show all of the great ways to avoid
   self modifying code.  But the examples have been contrived and not the
   least instructive.

   Lets take a real world example:

   A spread sheet program which must perform several thousand iterations
   of a formula while recalculating.  The formula can be "compiled" to
   the stack, and run such that the execution time is considerably less
   than if the formula had been interpreted each time.

   A video device driver or some such that uses raster ops to write values
   to the video display.  (Given the Amiga has a blitter with this ability
   I do not ask for confirmation of the relevance of this arguement)  The
   driver can "compile" a raster operation fill algorithm into some small
   code segment and run it.  This is indeed self modifying code, but is 
   almost essential for speed, because the user hates to wait for screen
   refresh.

   Anyway, that was just some fodder for you guys.  I submit this for
   rationalization or destruction.

I would think that these examples represent "code-generating" rather
than "code-modifying."  Why not just create the new code segments off in 
memory somewhere and then jump to that address?  Would that be illegal?

jnmoyne@lbl.gov (Jean-Noel MOYNE) (07/13/90)

       The problem with self modifying code, is that if your system has 
got a cache memory for the instructions, modifying code (in memory) that 
is still in the cache won't modify code in the cache, and the CPU will 
still execute the unmodified code in its cache.

       So, to run such software on 020s and 030s, you have to turn the 
cache off for the instructions (not likely to increase the speed, and 
since you're looking for speed). And since you canot make assumptions on 
what is on the cache at a specified moment, the rules are: don't make any 
selfmodifying code. Sure if you modifiy code which is 512Ko away of the 
actual PC position, you can think that it's not in the cache, but you're 
still making suppositions ...

     Anyway you can allways write non selfmodifying code which is fast 
enough, especially on a 68000 (or a 68020 or more) for all sort of speedy 
applications. Self modifying code could be usefull on 8 bits CPU, but if 
you search a little more you can live without it on the Amiga>


   JNM

new@udel.EDU (Darren New) (07/13/90)

In article <6171@helios.ee.lbl.gov> jnmoyne@lbl.gov (Jean-Noel MOYNE) writes:
>     Anyway you can allways write non selfmodifying code which is fast >enough, 

Sometimes it's not just speed.  What about a Forth interpreter with CODE words?
These functions will compile down to machine code which may be run immediately
after compilation into memory. There MUST be an approved way to flush the cache 
without going through LoadSeg() or there will be some programs that just can't
be done.   Is it in there?   1/2 :-)                 -- Darren

king@motcid.UUCP (Steven King) (07/14/90)

Aside from other considerations, self-modifying code is a true nightmare
to maintain.  If you're going to do a one-shot demo or something like that,
great.  Do what you want to get every last cycle of performance you can out
of the system.  On the other hand, if you're writing something that has
a long lifecycle, something that you're going to be upgrading and keeping
around for years, self-modifying code is probably the worst thing imaginable
to try to change.  It's very hard to thoroughly debug and test.  And if
you move on to another project and someone ELSE has to take it over...
Things degrade rapidly.

-- 
---------------------------------------------------+---------------------------
It's only impossible until it's done.              | Steve King  (708) 991-8056
                                                   |   ...uunet!motcid!king
                                                   |   ...ddsw1!palnet!stevek

rod@venera.isi.edu (Rodney Doyle Van Meter III) (07/14/90)

I understand WHY self-modifying code won't work, but nobody has said
HOW to get around it, assuming you're compiling code into memory to
be executed.

Is there a way to simply flush the caches that won't take a gazillion
bus cycles? I work in an interactive compiled environment, so we need
to do something like this once per user input: accept it, compile it
to a (fixed) memory space well apart from the compiler itself, then
jump to it, and, eventually, return to the user input routine. A cache
flush might be okay, if it takes less than 50 milliseconds or so.
Simply turning the caches off is unlikely to be an acceptable solution.

We're hoping to hit the 68030 (Suns fist, then I'm pushing for Amiga),
then the MIPS box.

				--Rod

jnmoyne@lbl.gov (Jean-Noel MOYNE) (07/14/90)

In article <14280@venera.isi.edu> rod@venera.isi.edu (Rodney Doyle Van 
Meter III) writes:
>  A cache
> flush might be okay, if it takes less than 50 milliseconds or so.
> Simply turning the caches off is unlikely to be an acceptable solution.

     So this is simple calculation: the 68030 built-in caches are 256 
bytes big (as I remember). But if you take the 68040, it's another story 
as they are 4 Kbytes big (quote me if I'm wrong, I'm not completly sure 
(-:)

Sullivan@cup.portal.com (sullivan - segall) (07/15/90)

>In article <14280@venera.isi.edu> rod@venera.isi.edu (Rodney Doyle Van 
>Meter III) writes:
>>  A cache
>> flush might be okay, if it takes less than 50 milliseconds or so.
>> Simply turning the caches off is unlikely to be an acceptable solution.
>
You can turn off the caches independently.  And it may be the only acceptable
solution.  Memory caches aren't likely to be all that effective if you 
are jumping to locations that have to be flushed anyway.  Your other option
is to define the memory that is going to be used for modified code as 
uncacheable.  (Which it is. But which is not supported under AmigaDOS.)

>     So this is simple calculation: the 68030 built-in caches are 256 
>bytes big (as I remember). But if you take the 68040, it's another story 
>as they are 4 Kbytes big (quote me if I'm wrong, I'm not completly sure 
>(-:)

The 68040 snoops the address bus for writes to cached memory, and updates
the cache it is sees any.  (I.e: you shouldn't ever have to flush the 
cache on any processor >= 68040.)  

 
                           -Sullivan Segall
_________________________________________________________________
 
/V\  Sullivan  was the first to learn how to jump  without moving.
 '   Is it not proper that the student should surpass the teacher?
To Quote the immortal Socrates: "I drank what?" -Sullivan
_________________________________________________________________
 
Mail to: ...sun!portal!cup.portal.com!Sullivan or
         Sullivan@cup.portal.com

dbk@teroach.UUCP (Dave Kinzer) (07/15/90)

In article <31727@cup.portal.com> Sullivan@cup.portal.com (sullivan - segall) writes:
>The 68040 snoops the address bus for writes to cached memory, and updates
>the cache it is sees any.  (I.e: you shouldn't ever have to flush the 
>cache on any processor >= 68040.)  

BZZZZZZZZZZZZZZT!   Sorry, but we have some lovely parting gifts...

   That the 68040 snoops anything it sees on the bus is true, the problem is
that all the blitter (and possibly other DMA in different configurations)
activity is hidden from the processor (allowing the processor do to other 
things.)

   The correct answer is that anytime, and that means ANYTIME, the memory
is loaded with instructions, the cache needs to be flushed.  This should
be a system call since the application program usually should know nothing
about the processor executing its code, with the system call handling the
individual system requirements.  I tried to find one in my 1.2 docs (I
thought I remembered one), but failed.  For 2.0, I expect "It's in 
there."  I left enough hints here that I wanted it.

   More years ago than I care to remember, I had an instructor point out
that all we had ever written was self modifying code.  Now this instructor
drilled us through the many courses we had taken that this was a definate
no no, and I had never stooped so low as to do it, so I was taken aback
by this statement.  What do you mean, self modifying?  None of our
programs would have passed!  The point of view he was taking was that
of the operating system, and that when our program was read into system
memory, the system's code was modified.  This subtile point was never lost
on me.  The system, of course, flushed the cache after loading (if so
equiped.)  You should too.

            * * *   Imminent use of deathnet predicted.   * * *             //
Dave Kinzer  (602)897-3085  asuvax!mcdphx!teroach!dbk  Opinions are mine. \X/

Sullivan@cup.portal.com (sullivan - segall) (07/16/90)

>>The 68040 snoops the address bus for writes to cached memory, and updates
>>the cache it is sees any.  (I.e: you shouldn't ever have to flush the 
>>cache on any processor >= 68040.)  
>
>BZZZZZZZZZZZZZZT!   Sorry, but we have some lovely parting gifts...
>
>   That the 68040 snoops anything it sees on the bus is true, the problem is
>that all the blitter (and possibly other DMA in different configurations)
>activity is hidden from the processor (allowing the processor do to other 
>things.)
>
If I am writing self-modifying code, I am hardly going to use the blitter
to make the modifications.  (Unless it is copper code, which doesn't have
a cache anyway.)  

The 68040 will always update the execution cache for its own memory writes.
Now whether or not it will update correctly when another device writes to
memory (ostensibly since the Agnus or Ramsey is always in the way) isn't 
something I would know about.

It would seem a terrible shame though to lose that capability in the 68040,
and have to clear 8k worth of cache any time a DMA completes.

>   The correct answer is that anytime, and that means ANYTIME, the memory
>is loaded with instructions, the cache needs to be flushed.  This should
>be a system call since the application program usually should know nothing
>about the processor executing its code, with the system call handling the
>individual system requirements.  I tried to find one in my 1.2 docs (I
>thought I remembered one), but failed.  For 2.0, I expect "It's in 
>there."  I left enough hints here that I wanted it.

In the real world this isn't an acceptable practice.  If I need to run 
self modifying code for speed, I won't have time to make an OS call.  Or
to put it differently, if I had the time to make an OS call, I wouldn't
need to use self modifying code, I could just jump to the appropriate 
vector.  

The only exception I can imagine, where an OS routine of this sort would
be useful, is when my program needs to compile something directly into 
memory.  In that case I still think it would be faster to either define
that memory segment as uncacheable, or (at least temporarily) turn off 
the execution cache entirely.
>
>   More years ago than I care to remember, I had an instructor point out
>that all we had ever written was self modifying code.  Now this instructor
>drilled us through the many courses we had taken that this was a definate
>no no, and I had never stooped so low as to do it, so I was taken aback
>by this statement.  What do you mean, self modifying?  None of our
>programs would have passed!  The point of view he was taking was that
>of the operating system, and that when our program was read into system
>memory, the system's code was modified.  This subtile point was never lost
>on me.  The system, of course, flushed the cache after loading (if so
>equiped.)  You should too.
>
How about some new MEMF_ bits.

	MEMF_ECACHE
	MEMF_DCACHE

where the default is TRUE for DCACHE , and FALSE for ECACHE.  That would
tend to leave the exec() code in the cache (where IMHO it belongs) and 
user programs would be cached if they specifically request it.  (If you
know you are going to be using the CPU for a long time, at a higher priority
than anything else on the system, you may as well use the cache.)  And 
programs that do need cache speed would have a better chance of getting 
it (even after a task switch.)

                           -Sullivan Segall
_________________________________________________________________
 
/V\  Sullivan  was the first to learn how to jump  without moving.
 '   Is it not proper that the student should surpass the teacher?
To Quote the immortal Socrates: "I drank what?" -Sullivan
_________________________________________________________________
 
Mail to: ...sun!portal!cup.portal.com!Sullivan or
         Sullivan@cup.portal.com
 
>

<LEEK@QUCDN.QueensU.CA> (07/17/90)

In article <31749@cup.portal.com>, Sullivan@cup.portal.com (sullivan - segall)
says:
>
>It would seem a terrible shame though to lose that capability in the 68040,
>and have to clear 8k worth of cache any time a DMA completes.
Yes, but that have nothing to do with self-modifying code... That's OS
problem.

... stuff refering to previous article deleted....

>In the real world this isn't an acceptable practice.  If I need to run
>self modifying code for speed, I won't have time to make an OS call.  Or
>to put it differently, if I had the time to make an OS call, I wouldn't
>need to use self modifying code, I could just jump to the appropriate
>vector.

Turning off /flushing a large cache might be a larger performance lost
than just doing vector jumps.  Why one would want to save a few machine
cycles while at the same time trying to run the CPU at 20% - 300% slower
(depending on program and CPU) is beyond my comprehension.  The major speed
advantage of the 020,030 etc are due to the caches.  Make sure you sit down
with the machine cycles for both cases (ie. cache with well behaved code and
nocache with self-modifying codes) before making any assumptions.

Flushing 4K worth of instruction cache in a 040 is probably bad.  If the
amount of time saved is justified, then go for it.  You'll get better
performance than turning off the cache entirely.  Do the OS call as it
is neglectable vs flushing the cache !!
>
>The only exception I can imagine, where an OS routine of this sort would
>be useful, is when my program needs to compile something directly into
>memory.  In that case I still think it would be faster to either define
>that memory segment as uncacheable, or (at least temporarily) turn off
>the execution cache entirely.

Read above for disadvantages of turning off cache...
Might be better to use the RAM: drive.
>
>        MEMF_ECACHE
>        MEMF_DCACHE
>
>where the default is TRUE for DCACHE , and FALSE for ECACHE.  That would
>tend to leave the exec() code in the cache (where IMHO it belongs) and
>user programs would be cached if they specifically request it.  (If you
>know you are going to be using the CPU for a long time, at a higher priority
>than anything else on the system, you may as well use the cache.)  And
>programs that do need cache speed would have a better chance of getting
>it (even after a task switch.)
>
>                           -Sullivan Segall
>

K. C. Lee

#! rnews

daveh@cbmvax.commodore.com (Dave Haynie) (07/18/90)

In article <31727@cup.portal.com> Sullivan@cup.portal.com (sullivan - segall) writes:

>The 68040 snoops the address bus for writes to cached memory, and updates
>the cache it is sees any.  (I.e: you shouldn't ever have to flush the 
>cache on any processor >= 68040.)  

The 68040 can be set up to snoop the CPU bus in a system.  This kind of bus
snooping is a real good idea for fully snooped system designs, but can't
always be used as drop-in to any existing system.  However, this kind of 
snooping does NOT imply that the 68040 will snoop itself -- eg, it doesn't
necessarily follow that the I-cache will be invalidated by an aliased 
write to D-space.

Interestingly enough, several years ago, before the MC68851 was released,
Commodore was working on a cache+MMU chipset.  It was never completed, but
promised to deliver a 0 wait 2K logical cache to a 16MHz 68020 with snooped,
copyback capability and burst fetches.  The internals were different, and 
the cache was of course a unified logical 2-set associative cache rather than
separate physical 4-set associative caches, but it did wind up looking a
little more like the 68040, at least philosophically, than one might have 
expected.

>                           -Sullivan Segall

-- 
Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
	"I have been given the freedom to do as I see fit" -REM