kosma%human-torch@stc.lockheed.com (Monty Kosma) (07/13/90)
From: Glenn Patrick Steffler <gpsteffl@sunee.waterloo.edu> Newsgroups: comp.sys.amiga.tech Keywords: religion, gurus, whats up? Date: 9 Jul 90 16:36:07 GMT Organization: Gerbils On Speed Inc. Sender: amiga-relay-request@udel.edu I have been waiting for people to show all of the great ways to avoid self modifying code. But the examples have been contrived and not the least instructive. Lets take a real world example: A spread sheet program which must perform several thousand iterations of a formula while recalculating. The formula can be "compiled" to the stack, and run such that the execution time is considerably less than if the formula had been interpreted each time. A video device driver or some such that uses raster ops to write values to the video display. (Given the Amiga has a blitter with this ability I do not ask for confirmation of the relevance of this arguement) The driver can "compile" a raster operation fill algorithm into some small code segment and run it. This is indeed self modifying code, but is almost essential for speed, because the user hates to wait for screen refresh. Anyway, that was just some fodder for you guys. I submit this for rationalization or destruction. I would think that these examples represent "code-generating" rather than "code-modifying." Why not just create the new code segments off in memory somewhere and then jump to that address? Would that be illegal?
jnmoyne@lbl.gov (Jean-Noel MOYNE) (07/13/90)
The problem with self modifying code, is that if your system has got a cache memory for the instructions, modifying code (in memory) that is still in the cache won't modify code in the cache, and the CPU will still execute the unmodified code in its cache. So, to run such software on 020s and 030s, you have to turn the cache off for the instructions (not likely to increase the speed, and since you're looking for speed). And since you canot make assumptions on what is on the cache at a specified moment, the rules are: don't make any selfmodifying code. Sure if you modifiy code which is 512Ko away of the actual PC position, you can think that it's not in the cache, but you're still making suppositions ... Anyway you can allways write non selfmodifying code which is fast enough, especially on a 68000 (or a 68020 or more) for all sort of speedy applications. Self modifying code could be usefull on 8 bits CPU, but if you search a little more you can live without it on the Amiga> JNM
new@udel.EDU (Darren New) (07/13/90)
In article <6171@helios.ee.lbl.gov> jnmoyne@lbl.gov (Jean-Noel MOYNE) writes: > Anyway you can allways write non selfmodifying code which is fast >enough, Sometimes it's not just speed. What about a Forth interpreter with CODE words? These functions will compile down to machine code which may be run immediately after compilation into memory. There MUST be an approved way to flush the cache without going through LoadSeg() or there will be some programs that just can't be done. Is it in there? 1/2 :-) -- Darren
king@motcid.UUCP (Steven King) (07/14/90)
Aside from other considerations, self-modifying code is a true nightmare to maintain. If you're going to do a one-shot demo or something like that, great. Do what you want to get every last cycle of performance you can out of the system. On the other hand, if you're writing something that has a long lifecycle, something that you're going to be upgrading and keeping around for years, self-modifying code is probably the worst thing imaginable to try to change. It's very hard to thoroughly debug and test. And if you move on to another project and someone ELSE has to take it over... Things degrade rapidly. -- ---------------------------------------------------+--------------------------- It's only impossible until it's done. | Steve King (708) 991-8056 | ...uunet!motcid!king | ...ddsw1!palnet!stevek
rod@venera.isi.edu (Rodney Doyle Van Meter III) (07/14/90)
I understand WHY self-modifying code won't work, but nobody has said HOW to get around it, assuming you're compiling code into memory to be executed. Is there a way to simply flush the caches that won't take a gazillion bus cycles? I work in an interactive compiled environment, so we need to do something like this once per user input: accept it, compile it to a (fixed) memory space well apart from the compiler itself, then jump to it, and, eventually, return to the user input routine. A cache flush might be okay, if it takes less than 50 milliseconds or so. Simply turning the caches off is unlikely to be an acceptable solution. We're hoping to hit the 68030 (Suns fist, then I'm pushing for Amiga), then the MIPS box. --Rod
jnmoyne@lbl.gov (Jean-Noel MOYNE) (07/14/90)
In article <14280@venera.isi.edu> rod@venera.isi.edu (Rodney Doyle Van Meter III) writes: > A cache > flush might be okay, if it takes less than 50 milliseconds or so. > Simply turning the caches off is unlikely to be an acceptable solution. So this is simple calculation: the 68030 built-in caches are 256 bytes big (as I remember). But if you take the 68040, it's another story as they are 4 Kbytes big (quote me if I'm wrong, I'm not completly sure (-:)
Sullivan@cup.portal.com (sullivan - segall) (07/15/90)
>In article <14280@venera.isi.edu> rod@venera.isi.edu (Rodney Doyle Van >Meter III) writes: >> A cache >> flush might be okay, if it takes less than 50 milliseconds or so. >> Simply turning the caches off is unlikely to be an acceptable solution. > You can turn off the caches independently. And it may be the only acceptable solution. Memory caches aren't likely to be all that effective if you are jumping to locations that have to be flushed anyway. Your other option is to define the memory that is going to be used for modified code as uncacheable. (Which it is. But which is not supported under AmigaDOS.) > So this is simple calculation: the 68030 built-in caches are 256 >bytes big (as I remember). But if you take the 68040, it's another story >as they are 4 Kbytes big (quote me if I'm wrong, I'm not completly sure >(-:) The 68040 snoops the address bus for writes to cached memory, and updates the cache it is sees any. (I.e: you shouldn't ever have to flush the cache on any processor >= 68040.) -Sullivan Segall _________________________________________________________________ /V\ Sullivan was the first to learn how to jump without moving. ' Is it not proper that the student should surpass the teacher? To Quote the immortal Socrates: "I drank what?" -Sullivan _________________________________________________________________ Mail to: ...sun!portal!cup.portal.com!Sullivan or Sullivan@cup.portal.com
dbk@teroach.UUCP (Dave Kinzer) (07/15/90)
In article <31727@cup.portal.com> Sullivan@cup.portal.com (sullivan - segall) writes: >The 68040 snoops the address bus for writes to cached memory, and updates >the cache it is sees any. (I.e: you shouldn't ever have to flush the >cache on any processor >= 68040.) BZZZZZZZZZZZZZZT! Sorry, but we have some lovely parting gifts... That the 68040 snoops anything it sees on the bus is true, the problem is that all the blitter (and possibly other DMA in different configurations) activity is hidden from the processor (allowing the processor do to other things.) The correct answer is that anytime, and that means ANYTIME, the memory is loaded with instructions, the cache needs to be flushed. This should be a system call since the application program usually should know nothing about the processor executing its code, with the system call handling the individual system requirements. I tried to find one in my 1.2 docs (I thought I remembered one), but failed. For 2.0, I expect "It's in there." I left enough hints here that I wanted it. More years ago than I care to remember, I had an instructor point out that all we had ever written was self modifying code. Now this instructor drilled us through the many courses we had taken that this was a definate no no, and I had never stooped so low as to do it, so I was taken aback by this statement. What do you mean, self modifying? None of our programs would have passed! The point of view he was taking was that of the operating system, and that when our program was read into system memory, the system's code was modified. This subtile point was never lost on me. The system, of course, flushed the cache after loading (if so equiped.) You should too. * * * Imminent use of deathnet predicted. * * * // Dave Kinzer (602)897-3085 asuvax!mcdphx!teroach!dbk Opinions are mine. \X/
Sullivan@cup.portal.com (sullivan - segall) (07/16/90)
>>The 68040 snoops the address bus for writes to cached memory, and updates >>the cache it is sees any. (I.e: you shouldn't ever have to flush the >>cache on any processor >= 68040.) > >BZZZZZZZZZZZZZZT! Sorry, but we have some lovely parting gifts... > > That the 68040 snoops anything it sees on the bus is true, the problem is >that all the blitter (and possibly other DMA in different configurations) >activity is hidden from the processor (allowing the processor do to other >things.) > If I am writing self-modifying code, I am hardly going to use the blitter to make the modifications. (Unless it is copper code, which doesn't have a cache anyway.) The 68040 will always update the execution cache for its own memory writes. Now whether or not it will update correctly when another device writes to memory (ostensibly since the Agnus or Ramsey is always in the way) isn't something I would know about. It would seem a terrible shame though to lose that capability in the 68040, and have to clear 8k worth of cache any time a DMA completes. > The correct answer is that anytime, and that means ANYTIME, the memory >is loaded with instructions, the cache needs to be flushed. This should >be a system call since the application program usually should know nothing >about the processor executing its code, with the system call handling the >individual system requirements. I tried to find one in my 1.2 docs (I >thought I remembered one), but failed. For 2.0, I expect "It's in >there." I left enough hints here that I wanted it. In the real world this isn't an acceptable practice. If I need to run self modifying code for speed, I won't have time to make an OS call. Or to put it differently, if I had the time to make an OS call, I wouldn't need to use self modifying code, I could just jump to the appropriate vector. The only exception I can imagine, where an OS routine of this sort would be useful, is when my program needs to compile something directly into memory. In that case I still think it would be faster to either define that memory segment as uncacheable, or (at least temporarily) turn off the execution cache entirely. > > More years ago than I care to remember, I had an instructor point out >that all we had ever written was self modifying code. Now this instructor >drilled us through the many courses we had taken that this was a definate >no no, and I had never stooped so low as to do it, so I was taken aback >by this statement. What do you mean, self modifying? None of our >programs would have passed! The point of view he was taking was that >of the operating system, and that when our program was read into system >memory, the system's code was modified. This subtile point was never lost >on me. The system, of course, flushed the cache after loading (if so >equiped.) You should too. > How about some new MEMF_ bits. MEMF_ECACHE MEMF_DCACHE where the default is TRUE for DCACHE , and FALSE for ECACHE. That would tend to leave the exec() code in the cache (where IMHO it belongs) and user programs would be cached if they specifically request it. (If you know you are going to be using the CPU for a long time, at a higher priority than anything else on the system, you may as well use the cache.) And programs that do need cache speed would have a better chance of getting it (even after a task switch.) -Sullivan Segall _________________________________________________________________ /V\ Sullivan was the first to learn how to jump without moving. ' Is it not proper that the student should surpass the teacher? To Quote the immortal Socrates: "I drank what?" -Sullivan _________________________________________________________________ Mail to: ...sun!portal!cup.portal.com!Sullivan or Sullivan@cup.portal.com >
<LEEK@QUCDN.QueensU.CA> (07/17/90)
In article <31749@cup.portal.com>, Sullivan@cup.portal.com (sullivan - segall) says: > >It would seem a terrible shame though to lose that capability in the 68040, >and have to clear 8k worth of cache any time a DMA completes. Yes, but that have nothing to do with self-modifying code... That's OS problem. ... stuff refering to previous article deleted.... >In the real world this isn't an acceptable practice. If I need to run >self modifying code for speed, I won't have time to make an OS call. Or >to put it differently, if I had the time to make an OS call, I wouldn't >need to use self modifying code, I could just jump to the appropriate >vector. Turning off /flushing a large cache might be a larger performance lost than just doing vector jumps. Why one would want to save a few machine cycles while at the same time trying to run the CPU at 20% - 300% slower (depending on program and CPU) is beyond my comprehension. The major speed advantage of the 020,030 etc are due to the caches. Make sure you sit down with the machine cycles for both cases (ie. cache with well behaved code and nocache with self-modifying codes) before making any assumptions. Flushing 4K worth of instruction cache in a 040 is probably bad. If the amount of time saved is justified, then go for it. You'll get better performance than turning off the cache entirely. Do the OS call as it is neglectable vs flushing the cache !! > >The only exception I can imagine, where an OS routine of this sort would >be useful, is when my program needs to compile something directly into >memory. In that case I still think it would be faster to either define >that memory segment as uncacheable, or (at least temporarily) turn off >the execution cache entirely. Read above for disadvantages of turning off cache... Might be better to use the RAM: drive. > > MEMF_ECACHE > MEMF_DCACHE > >where the default is TRUE for DCACHE , and FALSE for ECACHE. That would >tend to leave the exec() code in the cache (where IMHO it belongs) and >user programs would be cached if they specifically request it. (If you >know you are going to be using the CPU for a long time, at a higher priority >than anything else on the system, you may as well use the cache.) And >programs that do need cache speed would have a better chance of getting >it (even after a task switch.) > > -Sullivan Segall > K. C. Lee #! rnews
daveh@cbmvax.commodore.com (Dave Haynie) (07/18/90)
In article <31727@cup.portal.com> Sullivan@cup.portal.com (sullivan - segall) writes: >The 68040 snoops the address bus for writes to cached memory, and updates >the cache it is sees any. (I.e: you shouldn't ever have to flush the >cache on any processor >= 68040.) The 68040 can be set up to snoop the CPU bus in a system. This kind of bus snooping is a real good idea for fully snooped system designs, but can't always be used as drop-in to any existing system. However, this kind of snooping does NOT imply that the 68040 will snoop itself -- eg, it doesn't necessarily follow that the I-cache will be invalidated by an aliased write to D-space. Interestingly enough, several years ago, before the MC68851 was released, Commodore was working on a cache+MMU chipset. It was never completed, but promised to deliver a 0 wait 2K logical cache to a 16MHz 68020 with snooped, copyback capability and burst fetches. The internals were different, and the cache was of course a unified logical 2-set associative cache rather than separate physical 4-set associative caches, but it did wind up looking a little more like the 68040, at least philosophically, than one might have expected. > -Sullivan Segall -- Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests" {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: hazy BIX: hazy "I have been given the freedom to do as I see fit" -REM