nather@ut-sally.UUCP (Ed Nather) (07/09/88)
In article <2972@geac.UUCP>, daveb@geac.UUCP (David Collier-Brown) writes: > In article <429@uwovax.uwo.ca> 16012_3045@uwovax.uwo.ca (Paul Gomme) writes: > [discussion of execute-only code segments] > > Besides, I thought that self-modifying code was (a) extremely difficult > >to write, and (b) considered poor programming practice. > > Yes, it is and it is. > This practice was formally named "horrid" about the time the GOTO was banished, and for the same reasons: it's *very* hard to figure out what is going on if either practice is exhibited unconstrained. Yet the idea of "self-modifying code" was one of the great motivators for encoding instructions in the same form as data, and holding them in the same kind of memory. Prior to that time, "instructions" were mostly wires on a plugboard and were hard to change in software. It did take a long time and a lot of thought to make the GOTO really unnecessary -- it's still hidden in C but not much used -- yet a similar effort has not been applied to self-modifying code, and I've always wondered why. My guess -- and it's only a guess -- is that there has not been much emphasis on real-time, time-critical programming techniques in most CS departments, so the need to get maximal speed from a particular chunk of hardware was not evident. But it's really needed -- if you have to watch an animated display of data on a CRT all night long that blinks and flickers, you learn very quickly that techniques to get display refresh faster than the human flicker-frequency can save lots of pounding headaches. If they are "forbidden" techniques that require a descent into assembly coding, so be it. Headaches hurt. Are the seeds of a new religion hidden here? -- Ed Nather Astronomy Dept, U of Texas @ Austin {backbones}!{noao,ut-sally}!utastro!nather nather@astro.as.utexas.edu
mcdonald@uxe.cso.uiuc.edu (07/10/88)
> [discussion of execute-only code segments] > > Besides, I thought that self-modifying code was (a) extremely difficult > >to write, and (b) considered poor programming practice. > > Yes, it is and it is. > Yes, it is and yes it is, but only by those who don't need it. There are good uses for self-modifying code: I have used it recently in two places (one on an IBMPC and the second on a PDP-11. The first is the use of self-modifying code in interrupt handlers. It is necessary sometimes to change data segment in such a handler. The only place to put the value for the data segment is in the code segment, because when the interrupt occurs, only the CS register is valid. You could quibble about whether this would be self-modifying code if you put the value out-of-line and got it through a pointer. This example was in assembler of course. The second use is not absolutely necessary - but the results sure are nice. This involves changing the code in graphics routines on the fly. Absolutely nothing is more important than making write-to-screen routines fast (well, nothing having to do with computers.) I have an example where 4 decisions about 1 instruction each need to be made at run time. I can do three things: 1. Use a case statement or an if to select the proper one based on a flag. 2. Put the little area of code where these occur in a subroutine and call the proper one through a pointer. Unfortunately there would be maybe 10 subroutines with the various choices. 3. Set up the code at run time ,which is self-modifying code. Possibility 1. is slow. No doubt about it. It reduces something which should occur in the blink of an eye to a crawl. Slowness is absolutely forbidden, so this is out. Possibility 3. won't be slow if a big enough chunk of code is included around the messy part. But is is memory intensive. If the smallest fast-enough chunk is 200 bytes, this adds up to 2000 bytes. Possibility 2 works fine and is fast enough, but requires assembler intervention- doing it in C is possible but actually harder (a lot harder). It also upsets certain religious net-persons. What did I actually do? On the PC, I used 10 different subroutines. On the PDP-11, where there is only 56K memory, I used self modifying code. This brings up another point. A fourth alternative, possible in Fortran but not in C, is the computed or assigned goto. The overhead for these is less than a subroutine call, so the amount of code you need in a clump for it to be fast enough is small. Alternatively, if the complier were smart enough (mine wasn't) it could convert a case statement into the equivalent of a computed goto. Doug McDonald
pardo@june.cs.washington.edu (David Keppel) (07/11/88)
[ What about that self-modifying code ] nather@astro.as.utexas.edu (Ed Nather) writes: > "horrid" [ hard to write, understand, maintain ] > >But it's really needed -- if you have to watch an animated display >of data on a CRT all night long that blinks and flickers, you learn >very quickly that techniques to get display refresh faster than the >human flicker-frequency can save lots of pounding headaches. If >they are "forbidden" techniques that require a descent into >assembly coding, so be it. Headaches hurt. Some reasons NOT to write S-M (Self Modifying or Sado-Masochistic) code: + It's machine dependent (we already knew that). + Read-only (e.g., code) pages may be shared. + Some machines have sperate I and D spaces. It's not inconceivable that different machines in the same family will have different memory models. + Many machines use seperate instruction and data caches. Instruction caches can be made much simpler (= smaller = faster) by not requiring them to pay attention to the data bus. Writing to a memory location is not guaranteed to update the I cache, thus you may or may not execute the modified instruction. Next week's model may have a large enough cache that this week's code breaks. Good enough? ;-D on ( Compiler hints won't save you here ) Pardo
gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/11/88)
In article <225800044@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes: >I can do three things: Although you didn't spell out the details of your situation, I'm sure there are quite a few other possibilities. For example, one can index an array to obtain the switch flag. As a significant example of C's efficiency for graphics programming, virtually all the code in the Blit, DMD (5620), and MTG (630) bitmapped terminals was written in C, and their graphics operations are extremely fast. No self-modifying code was necessary. One of my "back burner" projects is to produce a display-list driven interactive 3D viewer for these terminals (and probably port it to Suns). I have no doubt that it can be done quite nicely while sticking to the C language rules.
nather@ut-sally.UUCP (Ed Nather) (07/11/88)
In article <5254@june.cs.washington.edu>, pardo@june.cs.washington.edu (David Keppel) writes: > [ What about that self-modifying code ] > nather@astro.as.utexas.edu (Ed Nather) writes: > > "horrid" > Tsk tsk -- out of context, like the movie ad that quotes a reviewer who said: "This movie is a great waste of time -- it's insipid, boring, and stupid. I've never seen anything like it." The ad quoted: "...great...I've never seen anything like it." I said the practice "...was formally declared horrid ...", NOT that *I* thought it was. I don't think that. > Some reasons NOT to write S-M (Self Modifying or Sado-Masochistic) code: > + It's machine dependent (we already knew that). Assembly code does tend to be machine dependent, all right -- but C was designed to give the programmer the same facility, but with better control and better discipline. It works quite well for those things it covers. It is, in some reasonable sense, portable. A good compiler could generate self-modifying code to suit, I'm sure. What we lack is the formal discipline for using it wisely and well. > + Read-only (e.g., code) pages may be shared. Only in a time-sharing system. Most real-time control programs can't tolerate such an environment for lots of reasons, mostly due to critical timing requirements. Anyway, if generated code is treated as "executable data" this whole point becomes irrelevant. > + Some machines have sperate I and D spaces. It's not inconceivable > that different machines in the same family will have different > memory models. Not a problem. Generated executable code changes, just like any other variable values, and can be treated the same way. > + Many machines use seperate instruction and data caches. Instruction > caches can be made much simpler (= smaller = faster) by not > requiring them to pay attention to the data bus. Writing to a > memory location is not guaranteed to update the I cache, thus you > may or may not execute the modified instruction. Next week's model > may have a large enough cache that this week's code breaks. > Again, not a problem. You don't take the generated code out of the I cache, but out of the D cache, since it can (by definition) change. > Good enough? Sorry, no. I've heard LOTS of arguments against programs that generate their own code, and all of them -- so far as I am aware -- assume the "proper" answer in generating arguments to show the answer is proper. Be honest: suppose you *had* to do it this way, but needed a formal discipline to keep it clean and understandable, and *fast.* Couldn't you find a reasonable way? "It's unclean because ... well, because it's so ... so ... UNCLEAN!" -- Ed Nather Astronomy Dept, U of Texas @ Austin {backbones}!{noao,ut-sally}!utastro!nather nather@astro.as.utexas.edu
boyne@hplvly.HP.COM (Art Boyne) (07/11/88)
pardo@june.cs.washington.edu (David Keppel) writes: > Many machines use seperate instruction and data caches. Instruction ^^^^^^^^ To quote a high school chemistry teacher: 'When will you guys realize that there is "a rat" in sep*arat*e?' Art Boyne, !hplabs!hplvly!boyne 'Think, guys, think - it only hurts for a little while!'
rwhite@nusdhub.UUCP (Robert C. White Jr.) (07/12/88)
in article <225800044@uxe.cso.uiuc.edu>, mcdonald@uxe.cso.uiuc.edu says: > Nf-ID: #R:ut-sally.UUCP:12330:uxe.cso.uiuc.edu:225800044:000:2699 > Nf-From: uxe.cso.uiuc.edu!mcdonald Jul 10 10:13:00 1988 > Yes, it is and yes it is, but only by those who don't need it. There > are good uses for self-modifying code: I have used it recently in > two places (one on an IBMPC and the second on a PDP-11. > The first is the use of self-modifying code in > interrupt handlers. It is necessary sometimes to change data segment > in such a handler. The only place to put the value for the data segment > is in the code segment, because when the interrupt occurs, only > the CS register is valid. You could quibble about whether this would > be self-modifying code if you put the value out-of-line and got it > through a pointer. This example was in assembler of course. Point 1) This is _not_ "self modifying" code as I learned it... "self modifying code" are things (in IBMPC assem) like: tests: jbne 27 je 27 ja 27 proc aaa far movsw tests[SI],ordin test AX,BX ordin: je 27 ... endp Where an entry condition changes the body of the code to reflect the data, instead of writing the code to handle every inline possibility as inline options. This is bad practice, and nearly impossible to follow when the substitave code sections are larger. Point 2) you should place all your data and code for an interrupt on an IBMPC in one segment and reach it through CS anyway. Once there you may juggle registers to your hearts content. To find "local" data, you make load-address, or CS-segment-data-spaces to store the relative information which you set up durring init, or even as EXEC loader directives; to whit: dw CSEG; dw OFFSET init. (etc.)
lisper-bjorn@CS.YALE.EDU (Bjorn Lisper) (07/12/88)
In article <12357@ut-sally.UUCP> nather@ut-sally.UUCP (Ed Nather) writes: >In article <5254@june.cs.washington.edu>, pardo@june.cs.washington.edu (David Keppel) writes: >> [ What about that self-modifying code ] .... >> Some reasons NOT to write S-M (Self Modifying or Sado-Masochistic) code: .... >> + Read-only (e.g., code) pages may be shared. > >Only in a time-sharing system. Most real-time control programs can't tolerate >such an environment for lots of reasons, mostly due to critical timing >requirements. Anyway, if generated code is treated as "executable data" this >whole point becomes irrelevant. There are time-sharing systems that can handle critical tasks (e.g. real-time applications with strict timing requirements) well. These systems have the possibility to give tasks priorities. A critical task can be given a high priority to enable it to meet its time limits. Such a system may very well have code that is shared. Another related issue is that self-modifying code cannot be executed directly from ROM. Executing programs in ROM is an important memory-saving technique in small, diskless, special-purpose systems. >> Good enough? > >Sorry, no. I've heard LOTS of arguments against programs that generate their >own code, and all of them -- so far as I am aware -- assume the "proper" >answer in generating arguments to show the answer is proper. Be honest: >suppose you *had* to do it this way, but needed a formal discipline to keep >it clean and understandable, and *fast.* Couldn't you find a reasonable way? > >"It's unclean because ... well, because it's so ... so ... UNCLEAN!" I guess there are situations with extreme performance requirements where self-modifying code is justified. It would be interesting to see what a semantics for self-modifying programs would look like. Regarding the "uncleanliness" I think there are two main sources for this. The first is the difficulty for a human mind to trace what's going on in a self-modifying program. The second is an advantage read-only data (e.g. non-self-modifying code) has to writable data: read-only data can be accessed asynchronously, in any order, even concurrently, without any constraints. Therefore read-only is "clean" in a concurrent environment. An example apart from code is data bases, where read-only data don't have to be locked during transactions. Bjorn Lisper
pardo@june.cs.washington.edu (David Keppel) (07/12/88)
nather@ut-sally.UUCP (Ed Nather) writes: >pardo@june.cs.washington.edu (David Keppel) writes: >>[ Out of context quote ] >[ Funny retort. ] Sorry. I do like your rejoinder, though. Could be about self-modifying code, even :-) > Sorry, no. I've heard LOTS of arguments against programs that > generate their own code, and all of them -- so far as I am aware > -- assume the "proper" answer in generating arguments to show the > answer is proper. Be honest: suppose you *had* to do it this way, > but needed a formal discipline to keep it clean and understandable, > and *fast.* Couldn't you find a reasonable way? Let me rephrase my position. Their is nothing architecturally weird about programs that generate their own code. Doing so does not cause any problems on any machines that I am aware of, although few OPERATING SYSTEMS support this. There is a *major* problem with modifying existing code. Consider a machine with seperate I-cache and D-cache. If you execute an instruction at location 0x500, then one at 0x504 that branches back to the one at 0x500, and the one at 0x500 modifies itself, either: + The cache can be built so that it pays attention to the changing data (code) in the I-cache. This requires a "snoopy" cache, which is more complicated, less dense, and slower. If you are constrained by price or die space, you may be forced to use a slower, smaller (worse hit-ratio) cache. + You can execute out of the D-cache. This is effectively the same as having a snoopy I-cache. If you have BOTH an I-cache and a D-cache, you're left with coherency problems that are solved by building a snoopy I-cache. + You can define this operation to be illegal, build a smaller, faster, non-snoopy cache, and pay the price when you need to run code that could be expressed "naturally" as self-modifying code. Assume that you have an algorithm that is, say, 1000 instructions and that each instruction takes 4 cycles to execute (must be a RISC :-). One particular part of it, 10 instructions, is written as self-modifying code. It can also be written as 100 instructions of non-self-modifying (static) code. Further assume that the effect of having a non-snoopy cache is to make the computer run 10% faster (e.g., the cache is faster AND has a better hit rate). Then the running time of the program on the machine with the snoopy I-cache is: T = 990 * 4.0 + 10 * 4.0 = 4000 cycles On the machine without a snoopy I-cache (10% faster), T = 990 * 3.6 + 100 * 3.6 = 3924 cycles which is marginally FASTER for NOT using self-modifying code, even though it takes an additional 90 isntructions. ;-D on ( Cycles to burn reading flames ) Pardo
mash@mips.COM (John Mashey) (07/12/88)
In article <12357@ut-sally.UUCP> nather@ut-sally.UUCP (Ed Nather) writes: .... >> + Many machines use seperate instruction and data caches. Instruction >> caches can be made much simpler (= smaller = faster) by not >> requiring them to pay attention to the data bus. Writing to a >> memory location is not guaranteed to update the I cache, thus you >> may or may not execute the modified instruction. Next week's model >> may have a large enough cache that this week's code breaks. >> > >Again, not a problem. You don't take the generated code out of the I cache, >but out of the D cache, since it can (by definition) change. No, machines that do this don't work that way, i.e., you only execute code from the I-cache or main memory, not from the D-cache (or the cache currently acting as the D-cache, in cases where you can swap them). There is nothing wrong with generating code on the fly. What is wrong is assuming a model of machines that looks just like a CPU + DRAM, or with caches that act only like coherent, joint I&D caches. This is just one instance of a general problem: how do you handle caches that are visible to the executable code in any way? What's needed is a good set of cache-control primitives that: a) Cover all of the cases b) Become widely used enough for people to count on. c) Are easy to make nops for enviornments where they don't matter. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
chasm@killer.UUCP (Charles Marslett) (07/12/88)
In article <33441@yale-celray.yale.UUCP>, lisper-bjorn@CS.YALE.EDU (Bjorn Lisper) writes: > Another related issue is that self-modifying code cannot be executed > directly from ROM. Executing programs in ROM is an important memory-saving > technique in small, diskless, special-purpose systems. Actually, I always thought that ROM was much more important as a way of eliminating the need to load the code and as a way of guaranteeing non- self-modifying-ness. Usually ROM code is larger (thus memory-consuming) than RAM-loaded code. This is because most RAM based systems assume some form of selfmodification of code, ranging from relocation tables that avoid the need for self-relative addressing everywhere to file initialization code that gets overwritten by its own buffer. On the other hand, some RAM based systems require 4 Mb to run an editor, so you may be right in those cases :->} !! Seriously, newer architectures have reduced the difference (to 0 perhaps), but the emphasis on RISC these days may resurrect self-modifying code -- a RISC-er technique is not known to mankind! (:-D}===<<). > Bjorn Lisper Charles Marslett
hjm@cernvax.UUCP (hjm) (07/12/88)
If the program is considered to be data for an interpreter which I will for convenience call a CPU, then does self-modifying code seem any different? As long as the data in this table remains consistent, then there should be no difficulty in dealing with it mathematically. All of the problems of database consistency have been tackled (but not all solved) some time ago. Code is data according to a PATCH program, and the only tricky bits there are to ensure that any changes fit inside the space that they replace and to patch up the jumps where necessary. A bit like a structure linked by pointers, such as a binary tree, except that code is in general a directed cyclic graph (the directedness is due to the program counter). Now, if the routines that ensure the consistency of the code change themselves, then the world will most likely fall down around one's head, but then things get to be so recursive that my stack oveflows ... Hubert Matthews
nather@ut-sally.UUCP (07/12/88)
In article <33441@yale-celray.yale.UUCP>, lisper-bjorn@CS.YALE.EDU (Bjorn Lisper) writes: > > I guess there are situations with extreme performance requirements where > self-modifying code is justified. It would be interesting to see what a > semantics for self-modifying programs would look like. I agree. That was the whole point of my original posting -- to see if someone with a talent for languages could ignore the current prejudice against this practice, and see what might be done. If there's really no way to do it in a reasonable manner, fine -- but just *saying* it's awful doesn't *make* it awful. > The first [source of uncleanliness] is > the difficulty for a human mind to trace what's going on in a self-modifying > program. I think this arises because of the two different intellectual "levels" involved: what the program is doing which generates the instructions (by analogy with the instruction generator in a compiler) and what those instructions are going to do when they are executed. These *must* be kept completely separate or massive confusion results. But if the written code is approached with full awareness of this requirement, the confusion vanishes -- in my experience, anyway. > The second is an advantage read-only data (e.g. non-self-modifying > code) has to writable data: read-only data can be accessed asynchronously, > in any order, even concurrently, without any constraints. Therefore > read-only is "clean" in a concurrent environment. I believe self-modifying code was condemned long before concurrent environments were even possible. This may be a disadvantage, but *no* techinque exists without trade-offs. I think we can live with this one. > An example apart from code > is data bases, where read-only data don't have to be locked during > transactions. OK, but I'm not sure this is relevant here. I do, however, appreciate your keeping "data" plural. It happens so rarely nowdays ... -- Ed Nather Astronomy Dept, U of Texas @ Austin {backbones}!{noao,ut-sally}!utastro!nather nather@astro.as.utexas.edu
dg@lakart.UUCP (David Goodenough) (07/13/88)
From article <225800044@uxe.cso.uiuc.edu>, by mcdonald@uxe.cso.uiuc.edu: > >> [discussion of execute-only code segments] >> > Besides, I thought that self-modifying code was (a) extremely difficult >> >to write, and (b) considered poor programming practice. >> >> Yes, it is and it is. >> > Yes, it is and yes it is, but only by those who don't need it. There > are good uses for self-modifying code: I have used it recently in > two places (one on an IBMPC and the second on a PDP-11. These are MY OPINIONS ONLY - you are free to agree and disagree and flame as you see fit. I have used S-M-C only once when doing a section of code that handled single stepping. The problem W/ the Z80 (comp.lang.c ??????) is that it has conditional jumps, calls AND returns. Now I go and fetch an instruction out of the code portion (i.e. where my PC is pointing to). It's 0xc2. Aha, I have a conditional instruction. Now to figure out whether the condition is met I have two choices: 1. Decode the bits that determine which flag is being looked at, and whether the flag should be set or reset. Get the flags into some register where I can use them. Mask out the flag in question. Do a condional jump on the result of the mask and whether the flag shold be set or clear. 2. Turn the instruction into a conditional jump (and with a mask then or with a mask - turns any condional (except the relative jumps) into a conditional jump). Drop the condional jump into a hole - get the flags and do the jump. If someone wants to see the code that I produced for both of the above I can e-mail. I ask you to take it on faith that 1 was about 40-50 lines, whereas 2 was 6 lines. Also BECAUSE I COMMENTED, it was possible to figure out what was going on. My mark of good commenting is code that can be read a year later and still understood. I agree that this is not for the faint at heart, but it was faster, and to my mind easier to understand. Note also that in this application speed was a premium: when I'm single stepping 1000 instructions I want things to happen PDQ as I would like to see the program appear to run as fast as possible. But then I'm a little insane, because who in their right mind writes a single step utility for a dead micro like the Z80 :-) Like everything it has it's place: and is not to be misused. Misuse of S-M-C *_IS_* a sin (well I think so), but where it is justified I will use it. -- dg@lakart.UUCP - David Goodenough +---+ | +-+-+ ....... !harvard!cca!lakart!dg +-+-+ | +---+
baum@Apple.COM (Allen J. Baum) (07/13/88)
[] >In article <5262@june.cs.washington.edu> pardo@uw-june.UUCP (David Keppel) writes: > >There is a *major* problem with modifying existing code. Consider a >machine with seperate I-cache and D-cache. If you execute an >instruction at location 0x500, then one at 0x504 that branches back >to the one at 0x500, and the one at 0x500 modifies itself, either: > > + The cache can be built so that it pays attention to the changing > data (code) in the I-cache.... > + You can execute out of the D-cache.... > + You can define this operation to be illegal.... or you can recognize the situation and issue a write-back-Dcache-line instruction, and a flush-Icache-line instruction, and everything works. Note that a compile-and-go situation is exactly this; you are building a program in the Dcache, and then you want to execute it. While it may be unlikely that any remnants are left over after linking, et. al., this may be the only way to guarantee it. -- {decwrl,hplabs,ihnp4}!nsc!apple!baum (408)973-3385
lisper-bjorn@CS.YALE.EDU (Bjorn Lisper) (07/13/88)
In article <4776@killer.UUCP> chasm@killer.UUCP (Charles Marslett) writes: >In article <33441@yale-celray.yale.UUCP>, I write: >> Another related issue is that self-modifying code cannot be executed >> directly from ROM. Executing programs in ROM is an important memory-saving >> technique in small, diskless, special-purpose systems. > >Actually, I always thought that ROM was much more important as a way of >eliminating the need to load the code and as a way of guaranteeing non- >self-modifying-ness. Usually ROM code is larger (thus memory-consuming) >than RAM-loaded code. This is because most RAM based systems assume some >form of selfmodification of code, ranging from relocation tables that avoid >the need for self-relative addressing everywhere to file initialization code >that gets overwritten by its own buffer.... As I wrote was actually thinking primarily of diskless systems. In such a system the program is stored in ROM. If the program is self-modifying, its code has to be loaded into RAM before it can execute. If not it can be executed directly in ROM and only the data areas need to be RAM. Thus memory is saved (we don't need the extra RAM to hold the code). Bjorn Lisper
hjm@cernvax.UUCP (hjm) (07/13/88)
I have been mulling over the idea of self-modifying code (SMC) for a while and
I've come to the conclusion that there is no good definition of SMC.
For example, if treating code as data is the definition, then does passing a
procedure as a parameter in PASCAL, or a pointer to a function in C count?
Probably not.
OK, what about a jump table. Consider an array of pointers to functions in C.
Does changing the pointers count as SMC? Again, I don't think so.
So, changing a pointer by assigning to it is not SMC, but putting a new jump
instruction in (e.g. jmp #somewhere_else) in place of an existing instruction
is SMC. Does one level of indirection really make that much difference?
Of course, if you want to be really horrid in C, you can try something like
this:
char codearray[] = { 0x03, /* one byte instruction for something */
0xc9 /* one byte return instruction */
}
and then 'call' this function using a cast to turn the pointer codearray into
a pointer to a function. (Who needs #asm anyway!) Then you can modify the
code as much as you want. This _is_ SMC without a doubt, because you can
overwrite code. So, I propose a very weak definition for SMC as code that
writes over other code.
As a final note, why is it 'clean' to alter a jump table and 'unclean' to alter
an inline constant (e.g. jmp @offset(r0) uses a value in memory as the address
but mov (pc)+,#1234 which loads an immediate does so too)? Why the subtle
difference? Any thoughts on the subject?
Hubert Matthews
(I don't consider LISP or PROLOG programs that create code on the fly to be
SMC. Does anyone disagree?)
krohn@u1100a.UUCP (Eric Krohn) (07/13/88)
In article <8239@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
] As a significant example of C's efficiency for graphics programming,
] virtually all the code in the Blit, DMD (5620), and MTG (630)
] bitmapped terminals was written in C, and their graphics operations
] are extremely fast. No self-modifying code was necessary.
Virtually all? Yes. The most critical? No.
The bottom-most level of all screen updates is the infamous
bitblt function. A 1982 BTL memo by John Reiser tells about the on-the-fly
code generation done by bitblt to achieve rather good performance on the
original Blit. I've assumed that similar code was carried forward to the 630
and the 5620 (even with the CPU switch).
My 630's host is down for hardware upgrade now, so I can't disassemble the
firmware to see what code is really there.
--
--
Eric J. Krohn
krohn@ctt.ctt.bellcore.com or {bcr,bellcore}!u1100a!krohn
Bell Communications Research, 444 Hoes Ln, Piscataway, NJ 08854
hankd@pur-ee.UUCP (Hank Dietz) (07/14/88)
In article <12360@ut-sally.UUCP>, nather@ut-sally.UUCP (Ed Nather) writes: > In article <33441@yale-celray.yale.UUCP>, lisper-bjorn@CS.YALE.EDU (Bjorn Lisper) writes: > > > > I guess there are situations with extreme performance requirements where > > self-modifying code is justified. It would be interesting to see what a > > semantics for self-modifying programs would look like. > > I agree. That was the whole point of my original posting -- to see if someone > with a talent for languages could ignore the current prejudice against this > practice, and see what might be done. If there's really no way to do it in a > reasonable manner, fine -- but just *saying* it's awful doesn't *make* it awful. The key problem with self-modifying code is that code created at runtime may REPLACE code that appeared at "compile" time, hence, in general, no STATIC understanding of the code is possible. This can make it hard for humans to understand the code, but more importantly it implies that compilers cannot perform optimizations, e.g., you cannot perform register allocation / value propagation / common subexpression elimination / in-line expansion because the code you remove might be code which is modified at runtime. The fix need not be the elimination of the generate-and-execute programming concept, rather, the language merely needs to specify constraints on the code which is to be generated at runtime. For example, key constraints would be that the runtime generated code could not redefine any existing function, could only be invoked through a statically-specified interface, and could only access data which either was statically-declared as accessible through that interface or was created as new data local to the new code. In other words, we can permit code to add to itself, but not to redefine what existed at compile time: self-extending code. An outline of fixes to be applied to Lisp (to create "Refined Lisp") appears in my PhD thesis: H. Dietz, "The Refined-Language Approach to Compiling for Parallel Supercomputers," June 1987. The folks working on Scheme had similar ideas.... -hankd
radford@calgary.UUCP (Radford Neal) (07/14/88)
There are interesting cases where on-the-fly generation of code seems to be essential to get good asymptotic space and/or time complexity. Consider the following code fragment (version A): for (i = 0; i<L; i++) { if (c1) p1(); if (c2) p2(); ... if (cN) pN(); } The Boolean variables c1, c2, ... cN are assumed to be loop invariants. Assuming the pi all take the same constant time and code space, the time complexity of this code is O(N), and its space complexity is also O(N). (I'm ignoring the factor of L for number of loop iterations.) Now consider the following equivalent code, specialized for N=2, but clearly generalizable to any N (version B): if (c1) { if (c2) for (i = 0; i<L; i++) { p1(); p2(); } else for (i = 0; i<L; i++) { p1(); } } else { if (c2) for (i = 0; i<L; i++) { p2(); } else /* Nothing */ ; } Code like this will have a time complexity of O(M), where M is the number of ci that are actually true, and a space complexity of O(2^N). Let's say that M, the number of true ci, is typically logN. This gives the following comparison for versions A and B: version space time A O(N) O(N) B O(2^N) O(logN) Thus you get to choose between time that is exponentially worse than it might be, or space that is exponentially worse than it might be. The following lets you get both (version C): start generating code; if (c1) generate instruction to call p1; if (c2) generate instruction to call p2; ... if (cN) generate instruction to call pN; for (i = 0; i<L; i++) execute generated instructions; This gives you the low time complexity of version B, with the low space complexity of version A. It's machine dependent, however. It _is_ possible to get the same form of time complexity without machine dependent operations, as follows (version D): void (*pa)[N+1]; j = 0; if (c1) pa[j++] = p1; if (c2) pa[j++] = p2; ... if (cN) pa[j++] = pN; pa[j] = 0; for (i = 0; i<L; i++) { for (j = 0; pa[j]; j++) (*pa[j])(); } This essentially works by generating code in a very specialized language, and then interpreting that code. This gives the same form of time complexity as version B, but with a larger constant factor - perhaps considerably larger if the pi aren't really procedures, but just short code sequences. This is the general sort of reason for the bit-blit algorithms that generate code on the fly, and I've encountered the same sort of situation in other contexts (e.g. a "life" program). One could imagine a compiler performing the transformation from version A to the machine-dependent version C, but I've never heard of such. Even if it did, it is a bit disturbing that a transformation of such significance for space/time complexity isn't expressed at the source level. This would be solved by having the compiler transform version D to the faster version C, but that's probably even harder for the compiler-writer to accomplish than the A to C transformation. Radford Neal
peter@ficc.UUCP (Peter da Silva) (07/14/88)
I have a question: Why are an Icache plus a Dcache better than just a big shared cache as big as both? -- Peter da Silva `-_-' Ferranti International Controls Corporation. "Have you hugged U your wolf today?" (uunet,tness1)!sugar!ficc!peter.
nather@ut-sally.UUCP (Ed Nather) (07/14/88)
In article <1100@nusdhub.UUCP>, rwhite@nusdhub.UUCP (Robert C. White Jr.) writes: > "self modifying code" are things (in IBMPC assem) like: > [ code omitted] > > Where an entry condition changes the body of the code to reflect > the data, instead of writing the code to handle every inline > possibility as inline options. This is bad practice, and nearly > impossible to follow when the substitave code sections are larger. Large programs written in assembler are hard to follow under any conditions. If the code is carefully documented, with the *intent* of the code clearly spelled out, I don't think this is harder to follow than any other technique, and I don't agree that it is "bad practice." If it keeps a volatile display from blinking, which it can because it can be much faster that branch-and-test-condition code, then I would label it as *good* practice. It would be even better if it could be done in a HLL like, say, C -- with dangerous and confusing possibilities sharply restricted by the language itself so the resulting code can be readily understood. That was the idea behind "structured programming," back when it was a religion. -- Ed Nather Astronomy Dept, U of Texas @ Austin {backbones}!{noao,ut-sally}!utastro!nather nather@astro.as.utexas.edu
jackson@esosun.UUCP (Jerry Jackson) (07/15/88)
In article <752@cernvax.UUCP> hjm@cernvax.UUCP (hjm) writes: Path: esosun!seismo!uunet!mcvax!cernvax!hjm From: hjm@cernvax.UUCP (hjm) Newsgroups: comp.lang.c,comp.arch Summary: What is self-modifying code anyway? Keywords: self-modifying code Date: 13 Jul 88 09:44:20 GMT References: <3353@cognos.UUCP> <619@goofy.megatest.UUCP> <429@uwovax.uwo.ca> <5254@june.cs.washington.edu> <12357@ut-sally.UUCP> <5262@june.cs.washington.edu> Reply-To: hjm@cernvax.UUCP () Organization: CERN European Laboratory for Particle Physics, CH-1211 Geneva, Switzerland Lines: 36 Xref: esosun comp.lang.c:11534 comp.arch:5582 >> As a final note, why is it 'clean' to alter a jump table and 'unclean' to alter >> an inline constant (e.g. jmp @offset(r0) uses a value in memory as the address >> but mov (pc)+,#1234 which loads an immediate does so too)? Why the subtle >> difference? Any thoughts on the subject? Hubert Matthews (I don't consider LISP or PROLOG programs that create code on the fly to be SMC. Does anyone disagree?) The main difference that I see is that the code in one case is not reentrant. In systems like unix where it is possible for more than one process to share a code segment at runtime, the jump table is local to each process while the code is not. Imagine if vi had true self-modifying code in it... (Many users typically share a code segment for programs like vi and csh) The example in which you create an array and then call it as a function does not really fit this definition of self-modifying, since the code that is changed is created in data space at runtime and hence cannot interfere with other programs using the same code segment. *Real* SMC would be something like: main() { char func(); long *longptr; longptr = (long *)func; *(longptr + 20) = 0xffffffff; /* shudder!!!!! */ } +-----------------------------------------------------------------------------+ | Jerry Jackson UUCP: seismo!esosun!jackson | | Geophysics Division, MS/22 ARPA: esosun!jackson@seismo.css.gov | | SAIC SOUND: (619)458-4924 | | 10210 Campus Point Drive | | San Diego, CA 92121 | +-----------------------------------------------------------------------------+
barmar@think.COM (Barry Margolin) (07/15/88)
In article <749@cernvax.UUCP> hjm@cernvax.UUCP () writes: >If the program is considered to be data for an interpreter which I will for >convenience call a CPU, then does self-modifying code seem any different? As >long as the data in this table remains consistent, then there should be no >difficulty in dealing with it mathematically. All of the problems of database >consistency have been tackled (but not all solved) some time ago. Code is data >according to a PATCH program, and the only tricky bits there are to ensure that >any changes fit inside the space that they replace and to patch up the jumps >where necessary. A bit like a structure linked by pointers, such as a binary >tree, except that code is in general a directed cyclic graph (the directedness >is due to the program counter). This view is valid, but simplifies the issues. If you are going to think of the program as a database, you must actually consider it as a distributed database. This is because the data is replicated in several places: the physical memory, the swapping device, the CPU caches, and internal processor registers. The problems of maintaining consistency in replicated databases is much less understood than concurrency control in non-distributed databases. Additionally, any form of consistency maintenance must incur some performance penalty, and hardware designers are under great pressure to make machines go as fast as possible. Multiple caches that do not maintain consistency are one answer to this. Since self-modifying code has generally been considered bad style for quite some time, they felt comfortable implementing a hardware optimization that assumes code will not modify nearby locations. Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
bclaus@wright.EDU (Brian Clausing) (07/15/88)
in article <5262@june.cs.washington.edu>, pardo@june.cs.washington.edu (David Keppel) says: > Xref: wright comp.lang.c:8563 comp.arch:4081 [...discussion about self-modifying code. It fouls up caches. It is intellectually unmanageable with current ISPs....] As to the semantics of self-modifying code, isn't it identical to a Turing machine's or am I missing something? Of course, reduction machines, like Turner's combinatoric implementation of SASL, are inherently self modifying. MB Clausing
jfh@rpp386.UUCP (John F. Haugh II) (07/15/88)
In article <3950008@hplvly.HP.COM> boyne@hplvly.HP.COM (Art Boyne) writes: >pardo@june.cs.washington.edu (David Keppel) writes: > >> Many machines use seperate instruction and data caches. Instruction > ^^^^^^^^ >To quote a high school chemistry teacher: > >'When will you guys realize that there is "a rat" in sep*arat*e?' to paraphrase andrew jackson, president of the united states, and face on the twenty dollar bill, how dull the mind which can only think of one spelling for a word. - john. -- John F. Haugh II +--------- Cute Chocolate Quote --------- HASA, "S" Division | "USENET should not be confused with UUCP: killer!rpp386!jfh | something that matters, like CHOCOLATE" DOMAIN: jfh@rpp386.uucp | -- with my apologizes
baum@Apple.COM (Allen J. Baum) (07/15/88)
[] >In article <1087@ficc.UUCP> peter@ficc.UUCP (Peter da Silva) writes: >I have a question: > > Why are an Icache plus a Dcache better than just > a big shared cache as big as both? In terms of hit/miss ratios, a unified cache is clearly better. However, in terms of effective access time, a split I/D cache is better. This is because both can be accessed simultaneously, instead of one having to wait for the other to finish. If both are getting accessed simultaneously (which should happen 40% of the time, if Loads/Stores account for %40 of instructions), then this more than offsets the increase in miss ratio. -- {decwrl,hplabs,ihnp4}!nsc!apple!baum (408)973-3385
tim@amdcad.AMD.COM (Tim Olson) (07/16/88)
In article <1087@ficc.UUCP> peter@ficc.UUCP (Peter da Silva) writes: | I have a question: | | Why are an Icache plus a Dcache better than just | a big shared cache as big as both? When the processor has separate instruction and data buses, so concurrent accesses can occur to both caches. -- -- Tim Olson Advanced Micro Devices (tim@delirun.amd.com)
lisper-bjorn@CS.YALE.EDU (Bjorn Lisper) (07/16/88)
In article <1744@vaxb.calgary.UUCP> radford@calgary.UUCP (Radford Neal) writes: > >There are interesting cases where on-the-fly generation of code seems >to be essential to get good asymptotic space and/or time complexity. > >Consider the following code fragment (version A): > > for (i = 0; i<L; i++) > { if (c1) p1(); > if (c2) p2(); > ... > if (cN) pN(); > } > >The Boolean variables c1, c2, ... cN are assumed to be loop invariants. (...version B deleted...) >The following lets you get both (version C): > > start generating code; > if (c1) generate instruction to call p1; > if (c2) generate instruction to call p2; > ... > if (cN) generate instruction to call pN; > for (i = 0; i<L; i++) execute generated instructions; Interesting. This reminds a lot of the kind of compile-time manipulation that optimizing compilers do. The loop for (i = 0; i<L; i++) execute generated instructions; is an optimized version of A *when the ci are known*. If they were known at compile-time an optimizing compiler could have generated the same code. The only difference is that the self-modifying program does this *at runtime*, when we are guaranteed to know the values of the ci's. Maybe we should call this technique "run-time compilation"? Bjorn Lisper
smryan@garth.UUCP (Steven Ryan) (07/16/88)
>'When will you guys realize that there is "a rat" in sep*arat*e?'
Why do you use *realism* and *realist* but then switch to *realize*?
s m ryan
-------------------------------------------
Self-descriptive sentences are irritating.
pardo@june.cs.washington.edu (David Keppel) (07/16/88)
In article <1087@ficc.UUCP> peter@ficc.UUCP (Peter da Silva) writes: >[ Why are two caches (I, D) better than one (I+D)? For a given real-estate, a non-snoopy I-cache will hold more data bits and be faster than a snoopy I-cache. + For a given real-estate, the I-cache hit rate will be better. This makes the average instruction fetch time lower. This may be a win if instruction fetch rate is at all a bottleneck. + For a given cache size, the real-estate can be smaller. This means: + The cache may be cheaper. + The cache may be small-enough to put closer to the instruction register, making it effectively faster. + The logic is simpler and more regular. + Faster design times. + Fewer bugs. + Easier to incorporate into VLSI designs (==faster). + Less logic to drive => faster, less power. + The cache doesn't need to be connected to as many things. + More freedom to place the cache => faster, cheaper. + Less external logic to drive => smaller, faster, cheaper. + Aliasing and data distribution are less of (none) a problem. This lets you build (with much less pain) + Heirarchical cache organizations (faster). + Virtually-addressed caches (faster). I-caches are particularly useful on machines that make heavy use of memory. This includes: + Many high-performance processors. + Shared-memory multiprocessors. The primary problem (that I'm aware of) with non-snoopy I-caches is that you must manually flush the I-cache every time that an instruction is changed. (Strictly speaking, every time you change an I-space address that has been touched since the last cold start). Does that answer your question ?-) ;-D on ( Faster ) Pardo
peter@athena.mit.edu (Peter J Desnoyers) (07/16/88)
In article <12382@ut-sally.UUCP> nather@ut-sally.UUCP (Ed Nather) writes: [talking about properly written self-modifying code] > >It would be even better if it could be done in a HLL like, say, C -- >with dangerous and confusing possibilities sharply restricted by the >language itself so the resulting code can be readily understood. The example that started this whole conversation was partial application. That IS "normally" incorporated into a language, although usually obscure, esoteric ones. Or else in Lisp, where it is partly supported as closures. The example faked partial application in C, which, like recursion in FORTRAN, is doable (on some architectures) but ugly. The necessity of using self-modifying code to implement partial application does not make it bad programming practice, any more than the necessity of using ML gotos to implement 'if' makes if-then statements bad programming practice. However, if you can't isolate the grungy part in the compiler (preferably) or a system call, then any advantages of this programming paradigm may be lost in the complexity (and danger) of using it. Peter Desnoyers peter@athena.mit.edu
petolino%joe@Sun.COM (Joe Petolino) (07/16/88)
>> Why are an Icache plus a Dcache better than just >> a big shared cache as big as both? > >In terms of hit/miss ratios, a unified cache is clearly better. I beg to differ. We ran a few simulations, comparing split and unified caches (total cache size in the 32K-512K range), and when the caches were direct-mapped (i.e. 1-way set-associative) the unified cache performed worse. Our guess is that when code and data are forced to co-exist in the same space, you get a high probability of collision and thrashing. This effect went away when we increased the degree of set associativity. One way to think about it is that having two (direct-mapped) caches gives you some of the benefits of set-associativity. Take this with a grain of NaCl: we only tried a few test programs, and each was small enough to fit into the cache. -Joe
fst@mcgp1.UUCP (Skip Tavakkolian) (07/16/88)
In March/April 1987 issue of AT&T TECHNICAL JOURNAL (volume 66, issue 2) there is an article about ``PICO - A PICTURE EDITOR'' by Gerard J. Holzmann. Its description of PICO is ``an interactive editor for digitized graphic images''. The interesting point about the VAX-750 version of this editor is that it ``contains an optimizing compiler that translates the expressions "on-the-fly" (interactively) into efficient code for the VAX-750 computer''. Author acknowledged Ken Thompson as the person who wrote the ``on-the-fly'' optimizing compiler. Is this considered self-modifying? If so, could one of the elders (a term of endearment) please shed light on this subject? Sincerely -- Fariborz ``Skip'' Tavakkolian UUCP ...!uw-beaver!tikal!mcgp1!fst
tps@chem.ucsd.edu (Tom Stockfisch) (07/16/88)
In article <33652@yale-celray.yale.UUCP> lisper-bjorn@CS.YALE.EDU (Bjorn Lisper) writes: >In article <1744@vaxb.calgary.UUCP> radford@calgary.UUCP (Radford Neal) writes: >>There are interesting cases where on-the-fly generation of code seems >>to be essential to get good asymptotic space and/or time complexity. >> >> for (i = 0; i<L; i++) >> { if (c1) p1(); >> if (c2) p2(); >> ... >> if (cN) pN(); >> } >>[SMC solution follows] >> start generating code; >> if (c1) generate instruction to call p1; >> if (c2) generate instruction to call p2; >> ... >> if (cN) generate instruction to call pN; >> for (i = 0; i<L; i++) execute generated instructions; Wouldn't the following be "nearly" as efficient, and be portable to boot? At least it has the same asymptotic complexity. enum { END, P1, P2, ..., Pn } code[MAXCODE], *p; p = code; if (c1) *p++ = P1; if (c2) *p++ = P2; ... if (cn) *p++ = Pn; *p = END; for ( pixel = 0; pixel++ < NPIXELS ) for ( p = code; *p != END; ) switch (*p++) { case P1: [p1 instructions] break; case P2: [p2 instructions] break; ... case Pn: [pn instructions] break; default: assert(0); /*NOTREACHED*/ } -- || Tom Stockfisch, UCSD Chemistry tps@chem.ucsd.edu
sean@lfcs.ed.ac.uk (Sean Matthews) (07/16/88)
With this talk about self modifying code, no one has mentioned the theoretical work that has been done on the subject. There is a quite well known paper (and quite controversial) on the subject of programs that can modify themselves and their interpreters by Brian Smith Reflection and Semantics in a procedural language MIT-LCS-TR-272 Mass. Inst. Tech. January 1982 Reflection and semantics in Lisp 11th ACM symposium on principles of programming languages Then there is the reply to these papers by Mitchel Wand and Daniel Freidman The mystery of the tower revealed: a non-reflective description of the reflective tower CACM 1986 (this is as far as I can go since I have a copy of a copy and there is no publishing information on it) An extended version of this paper was published in Meta-level architectures and reflection, P. Maes and D. Nardi (editors) Elsevier Science Publishers B.V. (North-Holland) 1988 There is a lot of theory about self modifying systems and self referential systems but by the time you start looking into it you are in philosophical logic, not comp.architecture Se\'an Matthews arpa: sean%uk.ac.ed.aipna@nss.cs.ucl.ac.uk
mash@mips.COM (John Mashey) (07/16/88)
In article <1087@ficc.UUCP> peter@ficc.UUCP (Peter da Silva) writes: > Why are an Icache plus a Dcache better than just > a big shared cache as big as both? 1) (Major) with split caches, you can access a word of instruction and a word of data in the same cycle. With a joint cache, you get one or the other. As a truly gross estimate, expect to lose 10-40% of your performance, given conflicts between I-refs and loads/stores. 2) Less important, but still useful. As you make a direct-mapped cache bigger, each successive 2X improves the hit-rate, but it improves it less than the last 2X. At some point, it is hard to make it bigger and keep the same speed SRAMs, and then the only way (keeping the same organization) to make the hit rate faster is to make it set-associative. Given the same total cache size, hit rates for many programs (not all, there are exceptions): joint, direct-mapped <= split, direct-mapped <= joint, 2-way set associative Note: those are Hit Rates, not performance. There are a bunch of reasons to keep caches (at least the ones nearest the CPU) direct-mapped when you can. The following is a fine paper that analyzes the various tradeoffs in cache design [rather than just hit rates], is: S. Pryzbylski, M. Horowitz, J. Hennessy, "Performance Tradeoffs in Cache Design", Proc. 15th Annual Symposium on Computer Architecture, May 30 - June 2, 1988, Honolulu, Hawaii. (Computer Architecture News 16, 2(May 1988), 290-298. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
bill@proxftl.UUCP (T. William Wells) (07/16/88)
In article <60175@sun.uucp>, petolino%joe@Sun.COM (Joe Petolino) writes: > >> Why are an Icache plus a Dcache better than just > >> a big shared cache as big as both? > > > >In terms of hit/miss ratios, a unified cache is clearly better. > > I beg to differ... > ...Take this with a grain of NaCl: we only > tried a few test programs, and each was small enough to fit into the cache. Another relevant point about separate caches is this: instructions and data have different access patterns. A cache designed for the one is not necessarily going to be right for the other; a cache designed for both may not do as well as the separate ones. So this possibility has to be balanced against the general benefit of having a larger pool to work with.
sl@van-bc.UUCP (pri=-10 Stuart Lynne) (07/17/88)
In article <752@cernvax.UUCP> hjm@cernvax.UUCP () writes: >I have been mulling over the idea of self-modifying code (SMC) for a while and >I've come to the conclusion that there is no good definition of SMC. >For example, if treating code as data is the definition, then does passing a >procedure as a parameter in PASCAL, or a pointer to a function in C count? >Probably not. >OK, what about a jump table. Consider an array of pointers to functions in C. >Does changing the pointers count as SMC? Again, I don't think so. >So, changing a pointer by assigning to it is not SMC, but putting a new jump >instruction in (e.g. jmp #somewhere_else) in place of an existing instruction >is SMC. Does one level of indirection really make that much difference? A simple definition might be any program that cannot be compiled and run in shared text mode (or equivalent for non Unix application environments). Modifying a jump table in your data space does not affect how the program will run for other users. Modifying a jump instruction in the shared text *will* affect how the program will run for other users. I have used SMC in places that where by design not required to be shared and where a high degree of efficency was required. For example in a p-System type interpreter you need to have a test in the opcode fetch and dispatch loop for a pseudo interrupt. This takes a lot of time. By simply patching in a jmp to_int as the first instruction of the loop we eliminate the need for an explicit test. To make the above example work for multi-user systems, we used a jump indirect through a var which contained either the ifetch address or interrupt handler address. A little slower but still faster than explicit test in the ifetch loop. Similiar type of problems can arise when doing high performance device drivers. -- Stuart.Lynne@wimsey.bc.ca {ubc-cs,uunet}!van-bc!sl Vancouver,BC,604-937-7532
leo@philmds.UUCP (Leo de Wit) (07/17/88)
In article <752@cernvax.UUCP> hjm@cernvax.UUCP () writes: >I have been mulling over the idea of self-modifying code (SMC) for a while and >I've come to the conclusion that there is no good definition of SMC. > >For example, if treating code as data is the definition, then does passing a >procedure as a parameter in PASCAL, or a pointer to a function in C count? >Probably not. True. A pointer is data. >OK, what about a jump table. Consider an array of pointers to functions in C. >Does changing the pointers count as SMC? Again, I don't think so. Again true. Note that it may be possible that the system (O.S., processor) checks whether the new pointer value represents a valid address, or a valid entry point. If this is (always) desirable is an entirely different question. E.g. some architectures will protest if you try to jump to an odd address. PASCAL will leave you probably less room to cheat than C. >So, changing a pointer by assigning to it is not SMC, but putting a new jump >instruction in (e.g. jmp #somewhere_else) in place of an existing instruction >is SMC. Does one level of indirection really make that much difference? Yes, I think it does make a difference. Maybe not always, but there are cases that you can't get away with this: think of re-entrant code, or shared text segments. Now I'm thinking of recursion, what about putting the code on the stack 8-) ? No worry about re-entrancy, and the C program model becomes more complete (we have already global, static and automatic data, and global and static functions, and now there's automatic functions...). >Of course, if you want to be really horrid in C, you can try something like >this: > >char codearray[] = { 0x03, /* one byte instruction for something */ > 0xc9 /* one byte return instruction */ > } This must be (or was a) Z80 freak! At least I recognize the 0xc9 as: RET. On a Z80 you could do other horrible things too, since instruction sizes vary from 1-4 bytes; by carefully picking your instructions you could use the same code twice (using a one-off entry point), with entirely different result. Good (?) old days, when memory was more expensive than programming effort.... >and then 'call' this function using a cast to turn the pointer codearray into >a pointer to a function. (Who needs #asm anyway!) Then you can modify the >code as much as you want. This _is_ SMC without a doubt, because you can >overwrite code. So, I propose a very weak definition for SMC as code that >writes over other code. Note that not all implementations will allow initialized arrays to be altered. I remember a problem we had last year on VMS while passing a initialized char array to mktemp() (or whatever it's name is); the program stackdumped because mktemp tried to write into the readonly array (yes, VMS put it in readonly!). >As a final note, why is it 'clean' to alter a jump table and 'unclean' to alter >an inline constant (e.g. jmp @offset(r0) uses a value in memory as the address >but mov (pc)+,#1234 which loads an immediate does so too)? Why the subtle >difference? Any thoughts on the subject? Try putting your code in ROM and see what happens. Just one example. Besides I think jump tables are generally not altered, pointers are. The jump tables could well be in the text segment, for instance. A jump table is not an array of function pointers. > Hubert Matthews > >(I don't consider LISP or PROLOG programs that create code on the fly to be >SMC. Does anyone disagree?) No, unless the code is buggy; such code on a fly could well be Self Multiplying 8-). Leo.
smryan@garth.UUCP (Steven Ryan) (07/17/88)
>As to the semantics of self-modifying code, isn't it identical to >a Turing machine's or am I missing something? No. The state mechanism of a Turing machine is fixed and finite. To show equivalence, it is necessary to show that self modifying code does not increase the number of states. Usually, people just appeal to Church-Turing and assume all the states are there, somewhere. For something like a load and go compiler, since every possible program is not hidden somewhere in the compiler, it would appear to escape CT. However the compiled program can be regarded as a transformation of the input tape into another region of the tape and its apparent execution as an interpretation of the tape by the compiler/loader/cpu state machine. Then again, if you believe really hard and clap your hands..... sm ryan ---------------------------------- Hugh: Why does he carry a sword? Fred: He has delusions of grandeur. him: Uh, Fred... (SWOOP) Fred: YEEK him: ...before you make a value judgement about a person's delusions... (poink) Fred: AWK! My head feathers! him: ...you'd better be absolutely sure they ARE delusions. (JAB) Fred: OUCH! -- Odd Bodkins
elg@killer.DALLAS.TX.US (Eric Green) (07/17/88)
in article <4776@killer.UUCP>, chasm@killer.UUCP (Charles Marslett) says: > In article <33441@yale-celray.yale.UUCP>, lisper-bjorn@CS.YALE.EDU (Bjorn Lisper) writes: >> Another related issue is that self-modifying code cannot be executed >> directly from ROM. Executing programs in ROM is an important memory-saving >> technique in small, diskless, special-purpose systems. > > Actually, I always thought that ROM was much more important as a way of > eliminating the need to load the code and as a way of guaranteeing non- > self-modifying-ness. Usually ROM code is larger (thus memory-consuming) > than RAM-loaded code. Uh, I don't know what systems you're talking about, but for the small microcontrollers that I've used -- the ROM is assembled to be placed in the memory map at an absolute location. That is, it takes up the same amount of space whether it's in ROM or RAM (in fact, one of my favorite hardware designs is a little SRAM card that I use to spoof ROM for prototyping purposes). As for re-using memory buffers etc., those are always in RAM, so those are always re-usable -- no matter where your executable is loaded. 32K bytes of ROM costs about 1/4th of what 32K bytes of static RAM costs (not to mention the eliminating-need-to-load). Note that these are dedicated control systems using 6502's, Z-80's, and 6809's, which have a total address space of 64K -- so 32K of ROM is a heckuva lot. > Seriously, newer architectures have reduced the difference (to 0 perhaps), > but the emphasis on RISC these days may resurrect self-modifying code -- > a RISC-er technique is not known to mankind! (:-D}===<<). Actually, RISC may be the stake-in-the-heart of self-modifying code. RISC technology overcomes the increased memory bandwidth requirement by using enhanced cache technology, 256-byte-at-a-time prefetch from page-mode DRAM's, heavy pipelining, and every other trick in the book. Most of which are quite contrary to the thought of self-modifying code. Another RISC trick is to use the saved silicon for extra registers, and have the compilers keep as many valus as possible in the CPU registers. You can always do a one-word memory-indirect load off of a register much faster than you can do a "memory-write(the address into the instruction) memory-read(the address part of the modified instruction) memory-read" triplet (which is at least three two-word instructions, as vs. a single one-word instruction, on a machine where one cycle = one word processed). But, I HAVE done self-modifying code on a 6502-based microcontroller (out of sheer necessity). I still make an occasional enhancement to that software, over 3 years later, as new applications arise, and have had no problems with readability. After all, in assembly on a microcontroller, you have to save the address somewhere, and the fact that the "somewhere" happens to be inside an assembly language instruction makes little difference. The argument against self-modifying code lies elsewhere besides readability. -- Eric Lee Green ..!{ames,decwrl,mit-eddie,osu-cis}!killer!elg Snail Mail P.O. Box 92191 Lafayette, LA 70509 MISFORTUNE, n. The kind of fortune that never misses.
g-rh@cca.CCA.COM (Richard Harter) (07/17/88)
In article <989@garth.UUCP> smryan@garth.UUCP (Steven Ryan) writes: >>As to the semantics of self-modifying code, isn't it identical to >>a Turing machine's or am I missing something? >No. The state mechanism of a Turing machine is fixed and finite. >To show equivalence, it is necessary to show that self modifying code >does not increase the number of states. Usually, people just appeal >to Church-Turing and assume all the states are there, somewhere. This needs some qualification. Strictly speaking no computers are Turing machines because they do not have an infinite memory or equivalent. However almost all computers would be universal Turing machines if they somehow did. However it depends on what level you are speaking of. If you are talking about the hardware, that's a *qualified* Turing machine. You can talk about a language being a virtual Turing machine; that makes sense, albeit you want to be more careful with your language than I am being here. The language remains fixed, and most languages are universal Turing machines. In this case the SMC is data. If you want to talk about a specific program as a virtual Turing machine (probably not universal) then, yes, it can't modify itself. -- In the fields of Hell where the grass grows high Are the graves of dreams allowed to die. Richard Harter, SMDS Inc.
conybear@moncsbruce.oz (Roland Conybeare) (07/18/88)
From article <752@cernvax.UUCP>, by hjm@cernvax.UUCP (hjm): > As a final note, why is it 'clean' to alter a jump table and 'unclean' to > alter an inline constant (e.g. jmp @offset(r0) uses a value in memory as the > address but mov (pc)+,#1234 which loads an immediate does so too)? Why > the subtle difference? Any thoughts on the subject? > > Hubert Matthews I can see several reasons. * the big, big reason for referring to code via pointers, and getting the effect of self-modifying code via such pointers, is that you make your changes independent of the size of the code. Real SMC will only work when the new code is no larger than the old code. I think this is a very restrictive assumption. * when you alter a jump table (in C, at least) you are doing so within the language, and can expect the compiler to understand you. A language which allows you to modify instructions directly would of necessity depend strongly on the machine architecture to run these instructions. Otherwise, why don't we all use Universal Assembly Language? Roland Conybeare conybear@moncsbruce.oz an instruction, like mov (pc)+,#1234 you are assuming that the change you make
daveb@geac.UUCP (David Collier-Brown) (07/18/88)
From article <33441@yale-celray.yale.UUCP>, by lisper-bjorn@CS.YALE.EDU (Bjorn Lisper): > In article <12357@ut-sally.UUCP> nather@ut-sally.UUCP (Ed Nather) writes: >>In article <5254@june.cs.washington.edu>, pardo@june.cs.washington.edu > (David Keppel) writes: [a long, and generally well-reasoned debate elided] >>Sorry, no. I've heard LOTS of arguments against programs that generate their >>own code, Gentles, could we ***PLEASE*** reserve the term "self-modifying code" for code which actually modifies itself on the fly (eg, for generating indexes by instruction modification instead of using index registers)? Much of what is being discussed here is part of the well-known and respectable "generate and execute" paradigm. The only difference between this and the normal code-generation paradigm is that an application generates code at run-time, not a compiler at a previous time. Mixing the two is making this discussion "noisy". --dave (sorry about the pedantry, but its important) c-b -- David Collier-Brown. {mnetor yunexus utgpu}!geac!daveb Geac Computers Ltd., | Computer science loses its 350 Steelcase Road, | memory, if not its mind, Markham, Ontario. | every six months.
baum@Apple.COM (Allen J. Baum) (07/19/88)
[] For those of you interested in on-the-fly generated code, here is a reference to an interesting article: Turning interpreters into Compilers Frank Pagan -C.S.U. Fullerton June 88 Software- Practice and Experience -- {decwrl,hplabs,ihnp4}!nsc!apple!baum (408)973-3385
hwfe@ur-tut (Harlan Feinstein) (07/19/88)
Reasons for writing self-modifying code have been scarce, and have basically centered around speed reasons. Speed is not the only reasons why someone would want to write s-m code. I've seen a program for the IBM PC called PC-LOCK that prohibits one from accessing the hard disk without providing a password. The install and de-install programs are self-modifying to make tracing them all but impossible. One time last summer I changed my password right before the weekend, and wouldn't you know it, come Monday I had no clue as to what the password was. I tried all kinds of disassembly and debug tracing to no avail. Hence, another reasons for self-modifying code. --harlan
hjm@cernvax.UUCP (hjm) (07/19/88)
Programmers have become quite accustomed to the destructive assignment of data ( x := y + 2 ), so why should the destructive changing of code be any different? If the practice were more common, then it wouldn't seem quite so bad, surely. After all, a Turing machine with a finite length tape is not *that* difficult ... Hubert Matthews
colwell@mfci.UUCP (Robert Colwell) (07/21/88)
In article <759@cernvax.UUCP> hjm@cernvax.UUCP () writes: > >Programmers have become quite accustomed to the destructive assignment of >data ( x := y + 2 ), so why should the destructive changing of code be any >different? If the practice were more common, then it wouldn't seem quite so >bad, surely. After all, a Turing machine with a finite length tape is not >*that* difficult ... Hubert, I was going to let this whole SMC discussion slide on by, but your post finally convinced me otherwise. First, you meant "y" := y + 2, not "x", right? Second, it seems like only yesterday when we (the royal we) CPU architects were so concerned with trying to narrow the semantic gap between what a programmer was trying to express and what the machine/language was coercing her into. Languages like Ada and machine architectures like capability machines were intended to address this perceived need. The basic idea is that (oversimplifying drastically) in a small programming environment (you at your standalone workstation) anything goes in terms of protection and language type checking, etc. The interesting domain is that of programming-in-the-large, with large teams of programmers producing hundreds of thousands of lines of code (SDI/Shuttle). There, extracting the last nano-mip of performance is of far less interest than in avoiding bugs and producing maintainable code, and I still believe that supporting that environment is a far more important challenge to architects than worrying about SMC could ever be. And I don't think (heavy sarcasm here) that UNIX/C is the ultimate answer to that environment. Bob Colwell mfci!colwell@uunet.uucp Multiflow Computer 175 N. Main St. Branford, CT 06405 203-488-6090
guy@gorodish.Sun.COM (Guy Harris) (07/21/88)
> Second, it seems like only yesterday when we (the royal we) CPU > architects were so concerned with trying to narrow the semantic gap > between what a programmer was trying to express and what the > machine/language was coercing her into. Languages like Ada and > machine architectures like capability machines were intended to > address this perceived need. A naive (and not rhetorical) question: what evidence is there to indicate the degree to which "narrowing the semantic gap" with capability machines and the like would improve the productivity of programmers or the reliability of programs, and to which other techniques (e.g., making a very fast conventional machine, possibly but not necessarily using RISC technology, and using that speed to improve the programming environment with better software) achieve the same goal?
peter@athena.mit.edu (Peter J Desnoyers) (07/21/88)
In article <60782@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes: > >A naive (and not rhetorical) question: what evidence is there to indicate the >degree to which "narrowing the semantic gap" with capability machines and the >like would improve the productivity of programmers or the reliability of >programs, and to which other techniques [fast machines, good software] >achieve the same goal? Some protection and tracing features are MUCH slower in software. These features are also useful in writing wonderful debuggers. You need a bus snooper or ICE with an 8088 to achieve debugger features that can be done in software on the 68020 or 80386. And per-task memory protection is a big win in any environment, but impossible in software. Peter Desnoyers peter@athena.mit.edu
guy@gorodish.Sun.COM (Guy Harris) (07/22/88)
> >A naive (and not rhetorical) question: what evidence is there to indicate > >the degree to which "narrowing the semantic gap" with capability machines > >and the like would improve the productivity of programmers or the > >reliability of programs, and to which other techniques [fast machines, good > >software] achieve the same goal? > > Some protection and tracing features are MUCH slower in software. > These features are also useful in writing wonderful debuggers. You > need a bus snooper or ICE with an 8088 to achieve debugger features > that can be done in software on the 68020 or 80386. And per-task > memory protection is a big win in any environment, but impossible in > software. OK, maybe I didn't make myself clear. I'm not referring to features you find on many random processors or systems out there, such as memory management units or "trap on transfer of control/trap on reference to particular location" breakpoints. I'm referring to the sort of architectural features you find in machines that, say, require references to objects to go through a "descriptor" for the object, to do bounds checking and the like. One instantiation of the question would be "are you better off doing that or having a fast machine plus a compiler that can e.g. generate bounds checking code but avoid doing it in some cases where it's 'known' that you don't need it?" For instance, don't bother doing it in double a[SIZE], b[SIZE], c[SIZE]; for (i = 0; i < SIZE; i++) a[i] = b[i] + c[i]; (or the FORTRAN, or Ada, or COBOL, or... equivalent), since you know "i" will always be within the bounds of all the arrays. Not "do you think you'd be better off" or "do you have an analysis that makes it 'intuitively obvious' that you'd be better off", but "has anybody compared actual implementations of the two approaches, and concluded that, with everything taken into account, you're better off with one or the other?" I.e., taking into account the fact that microcode and hardware is like software in that it is possible for it to be buggy (either designed incorrectly or implemented incorrectly), and taking into account the time it takes to design that system, and....
colwell@mfci.UUCP (Robert Colwell) (07/22/88)
In article <60782@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes: >> Second, it seems like only yesterday when we (the royal we) CPU >> architects were so concerned with trying to narrow the semantic gap >> between what a programmer was trying to express and what the >> machine/language was coercing her into. Languages like Ada and >> machine architectures like capability machines were intended to >> address this perceived need. > >A naive (and not rhetorical) question: what evidence is there to indicate the >degree to which "narrowing the semantic gap" with capability machines and the >like would improve the productivity of programmers or the reliability of >programs, and to which other techniques (e.g., making a very fast conventional >machine, possibly but not necessarily using RISC technology, and using that >speed to improve the programming environment with better software) achieve the >same goal? As far as I know, there is no evidence that you would necessarily find compelling, but then I could say that same thing about almost everything else in this field, too. There are, on the other hand, some good reasons to believe that we can do better than imperative languages running on essentially unprotected architectures. I know I can't do this topic justice in this forum, but here's my quick take on the subject. Think about the different computer languages you have used, and what was good or bad about each. Backus argued (in his famous paper on functional languages) that one of the reasons that Lisp is a good language is that you don't have to mentally execute a program (as you do with imperative languages) in order to convince yourself that it will do what is wanted. The language allows you to more closely express what is desired, so you test correctness by inspection. And yes, there are cases where it's more obvious/easier-to-express something in C than Lisp. But both examples serve to illustrate the point -- you, as a programmer, have some virtual entity you want to realize (data structure or algorithm), and the closer you can get to realizing exactly what you have in mind, the more likely the code is to be correct, maintainable, and understandable (and the smaller the semantic gap between what you want and what you can express). That's partly an allegory. The capability people argue that the same thing extends into all areas of computer systems. Recall the classic arguments about turning off runtime bounds checking to reclaim that lost performance -- why should a programmer, in the best of all possible worlds, have to worry about things like that? It doesn't help any of the major productivity metrics. Say what you want about Ada on the 432 (and I've said plenty already), as a programming environment (forget the speed problems temporarily) it was really nice. Your program had access to only the data structures you specified, and only in the ways you specified, and if your code screwed up, the machine could tell you where and in what way. To me, the hardest bugs to find are those where you fix one bug (in someone else's code, of course) and inadvertently break it in some obscure way such that something completely unrelated is getting trashed. For this to even be possible (in my opinion) means that the programming environment is lacking something crucial, notwithstanding that about 95% of all programmers on this planet see just such an environment. The environment is failing to express what the programmer wanted, and it's a combined failure of the machine architecture, the language, and probably the OS too. The semantic gap argument says that the more the desired and achieved environments differ, the larger the quantity of bugs, and the worse their obscurity will be. I know you're tempted at this point to say that even if one grants this as a shortcoming of current architectures/environments, there have been no successes so far at addressing it. That's another topic of conversation, I think; all I'm trying to do here is offer some of the justification for why a lot of research has gone into other ways to compute (that don't have sheer instrs-executed-per-unit-time as their primary figure of merit). All the strong-type-checking stuff built into some languages has the same motivation as above. For me, the bottom line is that, as usual, there aren't any easy answers (because if there were somebody would've patented them by now), but we shouldn't lose track of the corresponding questions just on that basis. The problem is getting worse, after all -- more, not fewer, lives currently depend on the correctness of software than ever before, and that trend will continue (fly-by-computer airplanes, dynamically-unstable planes, military systems, ...). I assume that the reason people like you and me are concentrating on performance is that that's what sells. I don't think that needs defending, but I also see few reasons to believe that sheer performance alone will provide the cures to the problems I tried to outline above. Bob Colwell mfci!colwell@uunet.uucp Multiflow Computer 175 N. Main St. Branford, CT 06405 203-488-6090
henry@utzoo.uucp (Henry Spencer) (07/24/88)
In article <473@m3.mfci.UUCP> colwell@mfci.UUCP (Robert Colwell) writes: >... The capability people argue that the same >thing extends into all areas of computer systems. Recall the classic >arguments about turning off runtime bounds checking to reclaim that >lost performance -- why should a programmer, in the best of all >possible worlds, have to worry about things like that? ... My counterargument is that it is almost as much of an imposition on the programmer to have such checks done at runtime as it is to have them not done at all. Given the impossibility of complete testing, there is no way to guarantee such tests will catch a bug during development. This means that the programmer has to go through the same complicated exercise of trying to assure himself that the problem will never happen. What is really wanted is a way to check these properties at COMPILE TIME... in which case the runtime checks become largely superfluous anyway. Can it be done? Well, in one sense the answer is clearly yes, because a proof of program correctness has to include it, and we know that automating such proofs is possible (although seldom practical at the moment). The question is whether it can be done with practical, affordable machinery without crippling the language. My own long-held conjecture is that the answer is "yes", but I don't have proof of that yet. -- Anyone who buys Wisconsin cheese is| Henry Spencer at U of Toronto Zoology a traitor to mankind. --Pournelle |uunet!mnetor!utzoo! henry @zoo.toronto.edu
colwell@mfci.UUCP (Robert Colwell) (07/24/88)
In article <1988Jul23.221743.22169@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <473@m3.mfci.UUCP> colwell@mfci.UUCP (Robert Colwell) writes: >>... The capability people argue that the same >>thing extends into all areas of computer systems. Recall the classic >>arguments about turning off runtime bounds checking to reclaim that >>lost performance -- why should a programmer, in the best of all >>possible worlds, have to worry about things like that? ... > >My counterargument is that it is almost as much of an imposition on the >programmer to have such checks done at runtime as it is to have them not >done at all. Given the impossibility of complete testing, there is no >way to guarantee such tests will catch a bug during development. This >means that the programmer has to go through the same complicated exercise >of trying to assure himself that the problem will never happen. What is >really wanted is a way to check these properties at COMPILE TIME... in >which case the runtime checks become largely superfluous anyway. We probably agree that the compiler should check as much as it possibly can; the earlier an error is detected, the less the ambiguity about what is wrong and the cheaper the cure. But there is an awful lot the compiler can't know about -- programs that use input data, generate pointers, or do just about anything else that's interesting, are going to do many operations that are difficult for the compiler to validate at compile time. I'm not sure I catch your drift on the imposition of runtime checks. To me, the best human debuggers have the ability to figure out the fastest which assumptions are being made that are wrong, thus getting to the heart of some problem the quickest. If I have a program bug, I want the machine to express as directly as possible the intent of my program so that I can narrow the bug-space down to my code alone. If the machine allows a change to one variable (an array, say) to affect some other unrelated variable, then it is not conforming to the intent of my program. In fact, it is not conforming to anything useful at all, since nobody would argue that it is useful programming practice to ever do such a thing on purpose (I hope?). Given that, the fact that the machine can do such a thing, let alone do it as a default, is what I'd label a shortcoming in the basic architecture. One we've all long ago learned to live with, to be sure, but it's not something to be proud of, at very least. >Can it be done? Well, in one sense the answer is clearly yes, because a >proof of program correctness has to include it, and we know that automating >such proofs is possible (although seldom practical at the moment). The >question is whether it can be done with practical, affordable machinery >without crippling the language. My own long-held conjecture is that the >answer is "yes", but I don't have proof of that yet. I sure hope you're right. In fact, if progress towards this goal becomes evident, I'd propose we start discussing ways of turning architecture towards better ways of supporting this instead of throughput or programming environments. Bob Colwell mfci!colwell@uunet.uucp Multiflow Computer 175 N. Main St. Branford, CT 06405 203-488-6090
ge@hobbit.sci.kun.nl (Ge' Weijers) (07/25/88)
From article <1087@ficc.UUCP>, by peter@ficc.UUCP (Peter da Silva): > I have a question: > > Why are an Icache plus a Dcache better than just > a big shared cache as big as both? The answer is bandwidth. The CPU does not have to stop filling the instruction pipeline when it accesses/writes data. -- Ge' Weijers, Informatics dept., Nijmegen University, the Netherlands UUCP: {uunet!,}mcvax!kunivv1!hobbit!ge
guy@gorodish.Sun.COM (Guy Harris) (07/26/88)
> We probably agree that the compiler should check as much as it > possibly can; the earlier an error is detected, the less the > ambiguity about what is wrong and the cheaper the cure. But there is > an awful lot the compiler can't know about -- programs that use input > data, generate pointers, or do just about anything else that's > interesting, are going to do many operations that are difficult for > the compiler to validate at compile time. In the case of programs that use input data, said programs should validate the input data before using it (unless they have good reason to know the data will *never* be incorrect); I would prefer to see input, line 12: value "12038" is out of range (should be between 17 and 137) than to see ILLEGAL ARRAY REFERENCE: "frobozz.q", line 31 Subscript was 12038; should be between 17 and 137 as the former, at least, tells me that the ultimate problem is in the data, not the code. Could a compiler know to remove run-time subscript checks in: i = read(); if (i < 17 || i > 137) fail("input, line %d: value "%d" is out of range (should be between 17 and 137"); printf("input value for %d is %d\n", i, array[i]); If so, it should probably issue warnings when it *isn't* able to remove run-time subscript checks, to inform programmers that they should probably put those checks in themselves. > >Can it be done? Well, in one sense the answer is clearly yes, because a > >proof of program correctness has to include it, and we know that automating > >such proofs is possible (although seldom practical at the moment). The > >question is whether it can be done with practical, affordable machinery > >without crippling the language. My own long-held conjecture is that the > >answer is "yes", but I don't have proof of that yet. > > I sure hope you're right. In fact, if progress towards this goal > becomes evident, I'd propose we start discussing ways of turning > architecture towards better ways of supporting this instead of > throughput or programming environments. Yes, but does this need "architectural" support, at least in the sense of "instruction set architecture"? If a compiler for a "conventional" machine can do that level of validation, subscript-range checking features in the instruction set would be seldom, if ever, used. If "architecture" includes the compiler, then I agree that "architectural" support for this would be nice.
smryan@garth.UUCP (Steven Ryan) (07/26/88)
>If the machine allows a change to one variable (an array, say) to >affect some other unrelated variable, then it is not conforming to >the intent of my program. In fact, it is not conforming to anything >useful at all, since nobody would argue that it is useful programming >practice to ever do such a thing on purpose (I hope?). This particular example is done all the time in handling recursive data structures. >>Can it be done? Well, in one sense the answer is clearly yes, because a >>proof of program correctness has to include it, and we know that automating >>such proofs is possible (although seldom practical at the moment). Depends on whether all possible programs (including those of monkey programmers) or just `practical' programs are considerred. Formal proofs of all possible programs are impossible, flat out, no appeal. Formal proofs of practical programs depend on how powerful practical has to be to remain useful.
levy@ttrdc.UUCP (Daniel R. Levy) (07/26/88)
In article <1988Jul23.221743.22169@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) writes: > -- > Anyone who buys Wisconsin cheese is| Henry Spencer at U of Toronto Zoology > a traitor to mankind. --Pournelle |uunet!mnetor!utzoo! henry @zoo.toronto.edu First it was NASA, now it's Wisconsin cheese. What gripe do you (and Pournelle for that matter) have against Wisconsin cheese? (The imagination boggles.) -- |------------Dan Levy------------| THE OPINIONS EXPRESSED HEREIN ARE MINE ONLY | AT&T Data Systems Group | AND ARE NOT TO BE IMPUTED TO AT&T. | Skokie, Illinois | |-----Path: att!ttbcad!levy-----|
colwell@mfci.UUCP (Robert Colwell) (07/26/88)
In article <61251@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes: >In the case of programs that use input data, said programs should validate the >input data before using it (unless they have good reason to know the data will >*never* be incorrect); I would prefer to see > > input, line 12: value "12038" is out of range (should be between > 17 and 137) > >than to see > > ILLEGAL ARRAY REFERENCE: "frobozz.q", line 31 > Subscript was 12038; should be between 17 and 137 > >Yes, but does this need "architectural" support, at least in the sense of >"instruction set architecture"? If a compiler for a "conventional" machine can >do that level of validation, subscript-range checking features in the >instruction set would be seldom, if ever, used. > >If "architecture" includes the compiler, then I agree that "architectural" >support for this would be nice. But the whole point of capability machines (to name a single example) is that one cannot cover the space of all interesting exception-possibilities using only a compiler, no matter how smart. For one thing, the program could be coercing data types back and forth such that checking an operation on a type can only be done at the time the operation is applied. But a more fundamental matter is how one manages the development of a large software project with dozens of programmers contributing to a single large end product. The modules are all separately compiled, so there is no question of the compiler helping out much. Given access to all the symbol tables, you could imagine the linker doing some reasonable checks of consistency (number and types of args, for instance), but even that fails when pointers are being passed (pointers to functions, even). You can catch a lot of the common cases with good programming style, as you note above. But you can't catch them all, and the question that capability machines pose is "how close can we come to an airtight programming environment, and how much would it cost"? (simplistic paraphrase, I know; maybe it'll draw some capability people out of their woodwork abstraction!) And how about functional languages? Again, the compiler can only do so much, and the space it covers is not intuitively obvious to the programmer. If it doesn't entirely cover the bug-space, and you aren't sure what it *does* cover, then the coverage is much less useful (and may not be useful at all, as Henry Spencer was alluding to in an earlier post.) Bob Colwell mfci!colwell@uunet.uucp Multiflow Computer 175 N. Main St. Branford, CT 06405 203-488-6090
guy@gorodish.Sun.COM (Guy Harris) (07/27/88)
> >Yes, but does this need "architectural" support, at least in the sense of > >"instruction set architecture"? If a compiler for a "conventional" machine > >can do that level of validation, subscript-range checking features in the > >instruction set would be seldom, if ever, used. > > > >If "architecture" includes the compiler, then I agree that "architectural" > >support for this would be nice. > > But the whole point of capability machines (to name a single example) > is that one cannot cover the space of all interesting > exception-possibilities using only a compiler, no matter how smart. > For one thing, the program could be coercing data types back and > forth such that checking an operation on a type can only be done at > the time the operation is applied. You missed the point. As I said in an earlier posting, you can spend some increased speed by putting in code to do the appropriate checks on a "conventional" machine. If the compiler can eliminate most of those checks, so much the better; you spend less of the increased speed. I don't see that, for example, doing these checks in microcode, as opposed to regular code, is *ipso facto* the only way to do things. In some ways, that's what I've been asking: what reason is there to believe that capability machines and the like are the *only* way, or even just the *best* way, to achieve the kind of programming environment I suspect most, if not all, of us would like to see? > But a more fundamental matter is how one manages the development of a > large software project with dozens of programmers contributing to a > single large end product. The modules are all separately compiled, > so there is no question of the compiler helping out much. > > Given access to all the symbol tables, you could imagine the linker > doing some reasonable checks of consistency (number and types of > args, for instance), but even that fails when pointers are being > passed (pointers to functions, even). I'm not so sure of that. I don't see how (assuming a "reasonable" language and "reasonable" linkers) doing the check at link time is intrinsically any different from doing it at compile time, if (as per the assumptions listed) the linker has all the information about types that was available to the compiler. I also don't see that pointers are that bad a problem, assuming pointers are typed (again, assuming a "reasonable" language); in ANSI C, for instance, "pointer to 'int'-valued function taking an 'int' argument" is a different type from "pointer to 'int'-valued function taking a 'float' argument". > You can catch a lot of the common cases with good programming style, > as you note above. But you can't catch them all, and the question > that capability machines pose is "how close can we come to an > airtight programming environment, and how much would it cost"? > (simplistic paraphrase, I know; maybe it'll draw some capability > people out of their woodwork abstraction!) I agree that you may not be able to catch all examples of incorrect code. However, if you can avoid doing any checking in those cases where you *can* catch them at compile time, you may be better off. The further questions that capability machines pose are: 1) How much "safer" are they than "conventional" machines plus compilers that generate run-time checks? and 2) If this extra safety is expensive, is it worth the cost? (Remember, the sort of embedded systems to which you've referred in the past do have real-time constraints, so there is presumably some minimum performance required; as the jobs they do get more sophisticated, the minimum performance required increases.)
smryan@garth.UUCP (Steven Ryan) (07/27/88)
In article <1075@garth.UUCP> smryan@garth.UUCP (Steven Ryan) writes: >>If the machine allows a change to one variable (an array, say) to >>affect some other unrelated variable, then it is not conforming to >>the intent of my program. In fact, it is not conforming to anything >>useful at all, since nobody would argue that it is useful programming >>practice to ever do such a thing on purpose (I hope?). > >This particular example is done all the time in handling recursive data >structures. What do I mean by that? [I do not know how to reply by private mail--student driver.] A simple example is reversing a singly linked list and, just to make it interesting, inserting the list ordinal: p := nil; q := first_element; n := 1; while q isnt nil do r := q.link; -- r and q.link are variables pointing at the location. q.link := p; -- p and q.link are now pointing at the same location. p := q; -- now p and q. p.ordinal := n; -- as a sideeffect, also modifies q.ordinal. q := r; -- now r and q. n +:= 1 od
ge@hobbit.sci.kun.nl (Ge' Weijers) (07/27/88)
From article <1848@van-bc.UUCP>, by sl@van-bc.UUCP (pri=-10 Stuart Lynne): [About self-modifying code] > A simple definition might be any program that cannot be compiled and run > in shared text mode (or equivalent for non Unix application environments). What about programs that write code into the data segment and then call it: int myproc[1000]; ..... generating s/m code (*((int (*)())myproc))(); /* and calling it */ We need a better definition I'm afraid -- Ge' Weijers, Informatics dept., Nijmegen University, the Netherlands UUCP: {uunet!,}mcvax!kunivv1!hobbit!ge
baum@Apple.COM (Allen J. Baum) (07/28/88)
[] >In article <61459@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes: >I don't see that, for example, doing these checks in microcode, as opposed to >regular code, is *ipso facto* the only way to do things.... Presumably, hardware (done in parallel with fetches) is the way to go >... the questions that capability machines pose are: > > 1) How much "safer" are they than "conventional" machines plus > compilers that generate run-time checks? > >and > > 2) If this extra safety is expensive, is it worth the cost? In some sense, the cost is very little, probably not much more than your standard segmentation scheme. The hardware cost is usually a tag bit, which is fast, pretty cheap on chip, but possibly expensive in your memory system. -- {decwrl,hplabs,ihnp4}!nsc!apple!baum (408)973-3385
colwell@mfci.UUCP (Robert Colwell) (07/28/88)
In article <61459@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes: >> >Yes, but does this need "architectural" support, at least in the sense of >> >"instruction set architecture"? If a compiler for a "conventional" machine >> >can do that level of validation, subscript-range checking features in the >> >instruction set would be seldom, if ever, used. >> > >> >If "architecture" includes the compiler, then I agree that "architectural" >> >support for this would be nice. >> >> But the whole point of capability machines (to name a single example) >> is that one cannot cover the space of all interesting >> exception-possibilities using only a compiler, no matter how smart. >> For one thing, the program could be coercing data types back and >> forth such that checking an operation on a type can only be done at >> the time the operation is applied. > >You missed the point. As I said in an earlier posting, you can spend some >increased speed by putting in code to do the appropriate checks on a >"conventional" machine. If the compiler can eliminate most of those checks, >so much the better; you spend less of the increased speed. Actually, I didn't miss the point, I ignored it because I wanted to deal with something I consider more important. Yes you can put in more checks. And if you aren't going to let me get anything better than that for a programming environment, then I'll take it as an improvement over the C/Unixes that currently exist. But I have a lot of doubts about this approach. To me it's like observing that an earth dam has 300 holes, and proposing to fix them one-by-one, each hole needing some different fix from the one before. The whole point of capabilities (at least as the 432 implemented them) was to provide a seamless programming environment, with exactly the same abstraction (that of an "object") presented to the programmer, no matter in which direction she looked (left or right through the code, or down to the hardware). You can't get this by throwing a large number of bandaids at the problem, even if you say they don't cost much in performance. >I don't see that, for example, doing these checks in microcode, as opposed to >regular code, is *ipso facto* the only way to do things. In some ways, that's >what I've been asking: what reason is there to believe that capability >machines and the like are the *only* way, or even just the *best* way, to >achieve the kind of programming environment I suspect most, if not all, of us >would like to see? Actually, I didn't say it was the only way. In fact, all I've been trying to argue is that there are reasons to believe that there are things worth considering from that kind of research and that type of programming environment. Sometimes I feel like all we do is debate the race cars, when it's the production cars that represent all the money. [discussion about specific checks elided; this message is too long] >I agree that you may not be able to catch all examples of incorrect code. >However, if you can avoid doing any checking in those cases where you *can* >catch them at compile time, you may be better off. The further questions that >capability machines pose are: You won't catch me disagreeing with this, even for capability machines. In fact, that was a major point of our article in the 13th Computer Arch Symp (I think it was 13 -- the one in Tokyo). > 1) How much "safer" are they than "conventional" machines plus > compilers that generate run-time checks? > >and > > 2) If this extra safety is expensive, is it worth the cost? Absolutely. I tried to quantify the cost of 432-style object orientation in my thesis. In a nutshell, I concluded that that style of machine be from 1 to 4 times slower than a conventional unprotected architecture made of equivalent implementation technology. (1 times slower means same speed). That's an order of magnitude faster than the 432 really was, but that doesn't count (see next issue of ACM TOCS to see why). You could do their transparent multiprocessing and buy that factor of 4 back, but if you got above 6 or 7 you'd saturate the bus (memory-to-memory architecture). Things get hard to sort out after that. Is that performance hit worth it? Who knows? I'd say it's probably worth it a lot more often than most people currently think. An awful lot of code got written and debugged on machines that were a lot slower than the could-have-been 432. You have to allow for a natural tendency (which I think largely fuels this whole industry) to always want more and to refuse to ever go backwards in anything, speed, memory, disk, sw complexity... >(Remember, the sort of embedded systems to which you've referred in the past do >have real-time constraints, so there is presumably some minimum performance >required; as the jobs they do get more sophisticated, the minimum performance >required increases.) Yeah, well, the Space Shuttle doesn't exactly have blazing hardware. But it's running some of the most sophisticated software I know of. And telephone switchers aren't mini-crays by any means. Bob Colwell mfci!colwell@uunet.uucp Multiflow Computer 175 N. Main St. Branford, CT 06405 203-488-6090
guy@gorodish.Sun.COM (Guy Harris) (07/28/88)
> But I have a lot of doubts about this approach. To me it's like > observing that an earth dam has 300 holes, and proposing to fix them > one-by-one, each hole needing some different fix from the one before. > The whole point of capabilities (at least as the 432 implemented > them) was to provide a seamless programming environment, with exactly > the same abstraction (that of an "object") presented to the > programmer, no matter in which direction she looked (left or right > through the code, or down to the hardware). What if they look at the microcode, or directly at the hardware (if the underlying machine isn't implemented using microcode)? In order for the environment to be truly airtight, the microcode (or hardware) has to be airtight. It may be that producing a sufficiently airtight compiler that implements the object-oriented model by producing the "right" code, including checks, is either impossible or more costly than producing a microcode or hardware implementation atop a compiler that doesn't have to be so airtight because the microcode or hardware is. If people's experience points in this direction, fine. However, how is this different from, say, implementing the interpreter in some "conventional" machine language, or in some language compiled into "conventional" machine language? If it's not different, could the interpreter then try compiling the code to be interpreted into that machine language, preserving the run-time checks? Could this still be sufficiently airtight? If so, would it be faster than, say, a microcode implementation of the same interpreter with no compilation? How about a microcode implementation that compiles either into some faster-to-interpret code, or directly into microcode? > Actually, I didn't say it was the only way. In fact, all I've been > trying to argue is that there are reasons to believe that there are > things worth considering from that kind of research and that type of > programming environment. I'll certainly agree with that; however... > Sometimes I feel like all we do is debate the race cars, when it's the > production cars that represent all the money. I won't agree with that, because 1) I don't think that's *all* people have been doing. While there's been relatively little discussion of it in *this* forum, I get the impression - perhaps erroneously - that people are thinking about how to implement the desired kind of programming environment, even if more in the "atop a 'conventional' machine language" form than in the "machine language that does lots of it directly" form; e.g., the SOAR people, people doing LISP implementations on "conventional" machines, etc.. 2) I don't think the analogy is appropriate. For one thing, most computers are - for better or worse - "conventional", so a better analogy would be "sometimes it feels like all we do is debate the production cars, when we should be concentrating on 'safety' cars", e.g. cars with significantly new features to increase safety. The second analogy raises some interesting questions: 1) How much are people willing to spend to have safer cars? Are they willing to pay X amount of extra money for airbags? (How much of a slowdown are people willing to accept for some amount of additional run-time checking?) 2) How much safer does a particular feature make you? (If 99% of the errors a particular run-time check detects can be caught at compile-time, how much safer are you if you catch the extra 1%?) 3) How many problems does a feature have that reduce its desirability - or even its contribution to safety? For example, if there's a risk of airbags detonating spontaneously, what is the risk of this happening and how bad is it if it does? (What are the chances that the implementation of the environment, and all its checking, has a bug in it, and how badly are you screwed if it is?) > Absolutely. I tried to quantify the cost of 432-style object > orientation in my thesis. In a nutshell, I concluded that that style > of machine be from 1 to 4 times slower than a conventional > unprotected architecture made of equivalent implementation > technology. Just out of curiosity: How would this figure change with newer implementation technologies for "conventional" machines (e.g. more powerful optimizers, different architectural style (for which you may read "the stuff that RISC implementors talk about a lot"), etc.? How would it change with newer implementations for protected architectures (including some of the same newer technologies mentioned for "conventional" machines, and also techniques such as compiling "protected architecture" code into "conventional" code, or doing "quick" checks in hardware and trapping to software checking code if they fail)? (For all I know, some of these may not be "newer" techniques; if so, how did that change the relative performance?) How easy is it to apply some of those techniques to instruction-set architectures such as the 432's, as opposed to applying them to, say, RISC architectures and using other techniques to implement a "protected" system atop such a "conventional" architecture? > You could do their transparent multiprocessing and buy that factor of > 4 back, but if you got above 6 or 7 you'd saturate the bus > (memory-to-memory architecture). Things get hard to sort out after > that. Yes, but could you buy an equivalent amount back with multiprocessing on "conventional" architectures with the protection implemented elsewhere than in the instruction set, even if the multiprocessing might be less transparent? Could it be a greater amount if the "conventional" architecture were not so memory-reference-intensive? Could you change the "protected" system to make *it* less memory-intensive and improve *its* multiprocessor performance? (At this point, one would no longer expect eyebrows to raise at the appearance of a new register-oriented machine.... :-)) > Is that performance hit worth it? Who knows? I'd say it's probably > worth it a lot more often than most people currently think. An awful > lot of code got written and debugged on machines that were a lot slower > than the could-have-been 432. You have to allow for a natural > tendency (which I think largely fuels this whole industry) to always > want more and to refuse to ever go backwards in anything, speed, > memory, disk, sw complexity... Yes, but there may be an economic justification for that tendency; the issues may not all be "technical" in the narrow sense. It may be hard, as you pointed out, to show improvements in programmer productivity, code reliability, etc. in a better programming environment, but if the advocates of such environments want people to spend money on those sorts of improvements rather than straight performance improvements, they should expect those who would pay the bills to be more swayed by that kind of evidence. (Another problem is that it may not even be cost-effective to adopt a technology that's clearly "better", technically; I suspect a new motor fuel would have to be quite a bit better before people were willing to write off the existing investment in the production and distribution of existing fuels. There are a lot of "conventional" machines out there.) > Yeah, well, the Space Shuttle doesn't exactly have blazing hardware. > But it's running some of the most sophisticated software I > know of. And telephone switchers aren't mini-crays by any means. Yeah, well, I don't know that either of those processors would necessarily have the requisite horsepower for some of the embedded applications somebody might want to put computers into in the future. That's why I said "as the jobs they do get more sophisticated, the minimum performance required increases;" a 4x performance hit relative to "conventional" computers may rule out a "protected" processor for some application until you can either come up with a way of solving the problem faster with existing hardware or make the hardware faster. I'm certainly willing to believe that there are applications now that could use a "protected" processor, but I'm not sure I'm willing to believe that there aren't applications that, at present, couldn't use a "protected" processor if it were 4x slower than a "conventional" one. In this case, a useful question to ask might be "can I get the same level of protection with an adequate level of performance if I don't put all the protection in the instruction set?" (BTW, I infer from the paper "Engineering a RISC Compiler System", by some people at MIPS, that their compiler can optimize register usage across procedure boundaries even when the calling and called procedures were separately compiled - they keep Ucode around for multiple compilation units, so that information for both calling and called procedure is available to the optimizer - so the claim made in "Fast Object-Oriented Procedure Calls: Lessons from the Intel 432" that "Clearly, the same approach (interprocedural register allocation) is inapplicable to separately compiled modules." is false. It may be true in some cases - for example, if the modules are bound together at run time - but it is hardly clear that it is always true. I believe David Wall has worked on link-time register allocation as well.)
colwell@mfci.UUCP (Robert Colwell) (07/28/88)
In article <61781@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes: [Guy, how come you always delete the previous correspondent's name? Are you trying to save me embarrassment?] [lots of interesting questions and points to ponder removed...I think this forum has run out of gas on a topic this complex, so in true Usenet style I'm going to pick nits instead. I'll try to get back to your 432 questions when I get a chance.] >I'm certainly willing to believe that there are applications now that could use >a "protected" processor, but I'm not sure I'm willing to believe that there >aren't applications that, at present, couldn't use a "protected" processor if >it were 4x slower than a "conventional" one. In this case, a useful question >to ask might be "can I get the same level of protection with an adequate level >of performance if I don't put all the protection in the instruction set?" If you think of the 432 as merely a "protected processor" you've not fully grasped what they created there. It was an *environment*, of which the central processor was just a component. You, as a programmer, did not see the 432 programming environment as a processor running with a certain amount of memory, a set of registers, some available OS calls, a certain type of virtual memory, etc. You the programmer were given a new universe in which all the pieces obeyed the same rules, and the rules were simple and few and not derived from hardward or language artifacts (by and large, anyway). I do not believe you can get this environment piecemeal. (But you can get closer to it than we currently are, and we probably agree on both the feasibility and desireability of this.) >(BTW, I infer from the paper "Engineering a RISC Compiler System", by some >people at MIPS, that their compiler can optimize register usage across >procedure boundaries even when the calling and called procedures were >separately compiled - they keep Ucode around for multiple compilation units, so >that information for both calling and called procedure is available to the >optimizer - so the claim made in "Fast Object-Oriented Procedure Calls: >Lessons from the Intel 432" that "Clearly, the same approach (interprocedural >register allocation) is inapplicable to separately compiled modules." is >false. It may be true in some cases - for example, if the modules are bound >together at run time - but it is hardly clear that it is always true. I >believe David Wall has worked on link-time register allocation as well.) We were talking about procedure calls made within an object-based programming environment, and I don't think the claim is false. In such an environment you do not want to tie your parameter-passing abstraction, with its expected runtime type-checking, to whatever register convention happens to be the current fashion. We didn't say you couldn't do it -- we meant you don't want to. Bob Colwell mfci!colwell@uunet.uucp Multiflow Computer 175 N. Main St. Branford, CT 06405 203-488-6090
gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/29/88)
In article <2079@u1100a.UUCP> krohn@u1100a.UUCP (Eric Krohn) writes: >In article <8239@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >] As a significant example of C's efficiency for graphics programming, >] virtually all the code in the Blit, DMD (5620), and MTG (630) >] bitmapped terminals was written in C, and their graphics operations >] are extremely fast. No self-modifying code was necessary. >Virtually all? Yes. The most critical? No. >The bottom-most level of all screen updates is the infamous >bitblt function. A 1982 BTL memo by John Reiser tells about the on-the-fly >code generation done by bitblt to achieve rather good performance on the >original Blit. I've assumed that similar code was carried forward to the 630 >and the 5620 (even with the CPU switch). Rob tells me that the 5620 doesn't use this trick. (There is a chance that the 630 does; I don't know.) In the Feb 1985 Pike & Locanthi paper in SP&E (V15, p131-151) they explain how this trick was done. They also mention that the savings over their best C version average about 20%. Of course, bitblt is "bottleneck code", so every little bit helps, but on-the-fly generation of instructions isn't essential (the first three Blit bitblt() implementations were all written in C). That was admittedly a risky example; bottleneck code is often made to use some trick to speed things up.
henry@utzoo.uucp (Henry Spencer) (07/30/88)
In article <477@m3.mfci.UUCP> colwell@mfci.UUCP (Robert Colwell) writes: >... there is >an awful lot the compiler can't know about -- programs that use input >data, generate pointers, or do just about anything else that's >interesting, are going to do many operations that are difficult for >the compiler to validate at compile time. My personal thesis is that the difficulty of doing this in a well-written program -- I am not interested in poorly-written ones -- is consistently overestimated. Remember, the programmer is supposed to be sure that these errors aren't going to happen, because even if they are caught immediately they are generally a disaster for production code. Frequently, the sort of checks that a programmer must put in to be sure about such things are the sort of thing that a compiler ought to be able to spot fairly easily, e.g. validation of input data. >I'm not sure I catch your drift on the imposition of runtime checks. >To me, the best human debuggers have the ability to figure out the >fastest which assumptions are being made that are wrong, thus getting >to the heart of some problem the quickest. If I have a program bug, >I want the machine to express as directly as possible the intent of >my program so that I can narrow the bug-space down to my code alone. We may have a slight misunderstanding here, which I admit I didn't notice in my original posting. I'm not claiming that run-time checks aren't useful during debugging. However, I claim that run-time checks are not a particularly good approach to catching problems during production use, because "subscript error in line 32, core dumped" can be every bit as bad as trash output in a production environment. What one really wants is a *guarantee* that (in the absence of hardware malfunction), it will not happen. This cannot be achieved by testing, regardless of how much help the hardware provides. >I sure hope you're right. In fact, if progress towards this goal >becomes evident, I'd propose we start discussing ways of turning >architecture towards better ways of supporting this instead of >throughput or programming environments. This is actually a programming-environment issue, though, and architecture is nearly irrelevant. A much more important issue is the programming languages being used, and how tractable they are. C, for example, is not a very favorable case -- too many problems with pointers. -- MSDOS is not dead, it just | Henry Spencer at U of Toronto Zoology smells that way. | uunet!mnetor!utzoo!henry henry@zoo.toronto.edu
oconnor@nuke.steinmetz (Dennis M. O'Connor) (07/30/88)
An article by ge@hobbit.sci.kun.nl (Ge' Weijers) says: ] From article <1087@ficc.UUCP>, by peter@ficc.UUCP (Peter da Silva): ] > Why are an Icache plus a Dcache better than just ] > a big shared cache as big as both? ] ] The answer is bandwidth. The CPU does not have to stop filling ] the instruction pipeline when it accesses/writes data. ] -- ] Ge' Weijers, Informatics dept., Nijmegen University, the Netherlands This is indeed part of the answer. The other part is that instruction-fetch behavior is much simpler and more predictable than operand fetch, and that I-caches don't need to be written-through, allowing an isolated I-cache to be simpler and more effective than a combined operand/instruction cache. Specilized I-caches, like a Branch-Target cache combined with pre-fetch, can produce effective hit rates of 99%+ with only a few thousand bits of storage. Operand caches can't do anywhere near this well. -- Dennis O'Connor oconnor%sungod@steinmetz.UUCP ARPA: OCONNORDM@ge-crd.arpa "Never confuse USENET with something that matters, like PIZZA."
henry@utzoo.uucp (Henry Spencer) (07/30/88)
In article <479@m3.mfci.UUCP> colwell@mfci.UUCP (Robert Colwell) writes: >But the whole point of capability machines (to name a single example) >is that one cannot cover the space of all interesting >exception-possibilities using only a compiler, no matter how smart. >For one thing, the program could be coercing data types back and >forth such that checking an operation on a type can only be done at >the time the operation is applied. Yes, it could be -- but how does the *programmer* know it's right? Remember, programs are not supposed to dump core in production operation. Remember also Dijkstra's comments on how human minds are actually fairly simple and find it difficult to reason about complex situations. There are three possibilities here: 1. Code is simple enough for human programmer to figure out easily, in which case the compiler should be able to do likewise, perhaps with a bit of help. 2. Code is wrong, human programmer is (e.g.) not validating his inputs properly. Compiler should reject it. 3. The code really is right but very subtle. Programmer has to have a way to tell the compiler "this is subtle but right". This should not be the default, as it effectively is now. My conjecture, not proven at the moment, is that only modest intelligence in the compiler and occasional tidying-up of code for clarity would be needed to make case 3 fairly rare, given favorable cirumstances (it is not clear that C constitutes a sufficiently favorable circumstance). >But a more fundamental matter is how one manages the development of a >large software project with dozens of programmers contributing to a >single large end product. The modules are all separately compiled, >so there is no question of the compiler helping out much. Please consider modern compiler technology, not 20-year-old versions. Ways to achieve inter-module checking are a major concern in modern languages, as witness (e.g.) C++ and Modula 2. It can be done, and ought to be done! >Given access to all the symbol tables, you could imagine the linker >doing some reasonable checks of consistency (number and types of >args, for instance), but even that fails when pointers are being >passed (pointers to functions, even). If the assumptions being made about those pointers are simple ones, they should be in the code, for compilers (and humans!) to see. If they are complex ones, then in a multi-programmer project they are probably being violated already. -- MSDOS is not dead, it just | Henry Spencer at U of Toronto Zoology smells that way. | uunet!mnetor!utzoo!henry henry@zoo.toronto.edu
mcdonald@uxe.cso.uiuc.edu (07/30/88)
>This is indeed part of the answer. The other part is that >instruction-fetch behavior is much simpler and more predictable >than operand fetch, and that I-caches don't need to be >written-through, allowing an isolated I-cache to be simpler >and more effective than a combined operand/instruction cache. This statement, unfortunately, connects up with the recurrent self- modifying-code theme. SEPARATE I and D caches can result in incorrect program execution if entries in the I cache are not voided if the D cache gets written at the same location. Fixing this requires either hardware to do this, or a "flush-cache" instruction. I strongly prefer that hardware do this sort of housekeeping. It is OK if there is a "dead period" of a reasonable number of cycles or instructions during which the I cache is wrong. Some machines, however, seem to be able to get this down to zero. Doug McDonald
mike@arizona.edu (Mike Coffin) (07/31/88)
From article <1988Jul29.202400.28068@utzoo.uucp>, by henry@utzoo.uucp (Henry Spencer): > ... There are three possibilities here: > > 1. Code is simple enough for human programmer to figure out easily, in > which case the compiler should be able to do likewise, perhaps > with a bit of help. > > 2. Code is wrong, human programmer is (e.g.) not validating his inputs > properly. Compiler should reject it. > > 3. The code really is right but very subtle. Programmer has to have a > way to tell the compiler "this is subtle but right". This should > not be the default, as it effectively is now. 4. Code is simple enough for a human programmer to figure out easily given "deep" knowledge about what the program is doing, in which case a compiler has big troubles figuring out almost anything. I have worked on some mathematical software --- computational group theory, if it matters --- where the code itself looks broken unless you know the right theorem. The theorem establishes an invariant that makes the code very easy to understand. I just thought of a simple example: a piece of code relies on the fact that an array always contains a permutation of the integers 1:N. This invariant is originally established by generating a random permutation via a very clever algorithm (Floyd's), and then maintained by always permuting the integers --- sometimes swapping them, other times rotating several. A procedure that relies on the fact that the array is a permutation is going to be extremely opaque to the compiler, yet might be transparent to a programmer. (Especially if the array is called "permutation".) -- Mike Coffin mike@arizona.edu Univ. of Ariz. Dept. of Comp. Sci. {allegra,cmcl2,ihnp4}!arizona!mike Tucson, AZ 85721 (602)621-4252
henry@utzoo.uucp (Henry Spencer) (08/04/88)
In article <6490@megaron.arizona.edu> mike@arizona.edu (Mike Coffin) writes: >4. Code is simple enough for a human programmer to figure out easily >given "deep" knowledge about what the program is doing, in which case >a compiler has big troubles figuring out almost anything... Well, remember, I'm not talking about proof of correctness; all I want is absence of runtime errors. Even so, I concede that this sort of thing is likely to happen now and then. Having to turn the checking off once in a while (and, preferably, having to justify this in writing) still strikes me as better than not having it at all. -- MSDOS is not dead, it just | Henry Spencer at U of Toronto Zoology smells that way. | uunet!mnetor!utzoo!henry henry@zoo.toronto.edu
atbowler@watmath.waterloo.edu (Alan T. Bowler [SDG]) (08/15/88)
In article <61781@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes: > >What if they look at the microcode, or directly at the hardware (if the >underlying machine isn't implemented using microcode)? In order for the >environment to be truly airtight, the microcode (or hardware) has to be >airtight. > >It may be that producing a sufficiently airtight compiler that implements the >object-oriented model by producing the "right" code, including checks, is >either impossible or more costly than producing a microcode or hardware >implementation atop a compiler that doesn't have to be so airtight because the >microcode or hardware is. If people's experience points in this direction, >fine. > Perhaps this is the wrong question. There is also the danger that the existence of the manditory checks will increase program complexity as the programmer finds that he has to program arround the environment to accomplish the task. A classic example are those Pascal programs that allocate 1 large static linear array and then run their own memory allocator withing it. Instead of machine pointers they pass arround indexes into the array. This has defeated the purpose of the Pascal tight checking, reintroduced all the old danger of running off the end of your allocated memory (into a chunk logically attached to another pseudo pointer), and increased the notational complexity of the program (and probability of errors). If you examine these programs and what they accomplish, you will usually conclude 1) This was the best choice that could be made given the constraints of the language 2) The programmer must have been persistent to get the thing working at all.
lisper-bjorn@CS.YALE.EDU (Bjorn Lisper) (08/22/88)
In article <20349@watmath.waterloo.edu> atbowler@watmath.waterloo.edu (Alan T. Bowler [SDG]) writes: >..... A classic example are those Pascal programs >that allocate 1 large static linear array and then run their own >memory allocator withing it. Instead of machine pointers they >pass arround indexes into the array. This has defeated the purpose >of the Pascal tight checking, reintroduced all the old danger of >running off the end of your allocated memory (into a chunk logically >attached to another pseudo pointer), and increased the notational complexity >of the program (and probability of errors). If you examine these programs >and what they accomplish, you will usually conclude > 1) This was the best choice that could be made given the constraints > of the language > 2) The programmer must have been persistent to get the thing working > at all. Just a comment on this....this is the typical technique to implement pointers and dynamic memory handling in Fortran, that to the contrary of Pascal doesn't have pointer types and dynamic memory allocation as language primitives. Are "those Pascal programs" maybe translated Fortran programs? Bjorn Lisper
Robert_P_Rubendunst@cup.portal.com (08/30/88)
Every operating system that loads code into RAM and then executes it is that nasty old self modifying code. Every cache on every computer should be flushable by CPU instruction.
rdubey@eecs.wsu.edu (Rakesh Dubey - grad student) (10/09/90)
>>> svissag@hubcap.clemson.edu >>> Re: Two Questions. About declaring the structure, it is possible but I can see no reason why you want to do that. Things like self-modifying code are simply not worth it. How would you debug/understand/prove a code that modifies itself. And you are trying a higher level programming language, would you modify the C source code statements and recompile everything. Coming back to the original point, this is the way you *can* do it. But it sure is stupid. 1. Read the name of the structure and kind of things the user wants to be in it. (Range, type, limits etc.) 2. Creat a file containing C source code which defines such a structure. 3. Now exec a Makefile which compiles your source code file and this brand new one you created just now. You will finally have a new process running. This might be called self-modifying higher level code and many extensions come to mind. But there will be bunch of problems too. Though I *think* you can do it, but you shouldn't be needing such tricky/unreliable kind of stuff. -- --- Rakesh Dubey rdubey@yoda.eecs.wsu.edu
peter@neccan.oz (Peter Miller) (10/09/90)
in article <1990Oct08.183922.12541@eecs.wsu.edu>, rdubey@eecs.wsu.edu (Rakesh Dubey - grad student) says: > Though I *think* you can do it, but you shouldn't be needing > such tricky/unreliable kind of stuff. I have used self-modifying code on a number of occasions, and while I don't advise its use as a general rule, I think it has a place in every programmers box of tricks. Examples: 1. On a Z80 I wrote some code which used a NMI (non-maskable interrupt). The problem was that the Z80 has no 16-bit indirect load opcode, only 8-bit indirect, and (for various reasons which would take ages to describe) it was not avoidable to do the 16-bit indirect load. Murphy's Law sprang into play, (I was doing the 16-bit indirect as 2 8-bit indirects) and the NMI, every few hours, fell beween the 2 8-bit indirects, giving an impossible result and causing all sorts of weird behaviour. I solved it by declaring an array of 4 chars and setting it to the absolute load opcode, the 16-bit address, and an rts opcode. The I cast the array pointer to a function pointer and called it. Viola, 16-bit indirecting function. char barf[4]; barf[0] = opcode ld hl,N barf[1] = addr; barf[2] = addr>>8; barf[3] = opcode rts result = (*(char *(*)())barf)(); 2. On a pseudo-emacs editor I wanted real regular expressions, So I wrote a RE compiler in Lisp which produced lisp as output, and promplty executed that output. Lisp makes self-modifying code easy. 3. A very long time ago I used an apple ][ (skeleton rattles). Disassembling the disk driver is enlightening. The disk io needed every last cycle from the machine, and so the code first calculated the hardware's memory addresses and poked it into the absolute load instructions for the rest of the disk io code, thus getting better performance from the code (abs, rather than register indexed) and freeing up a register, too. 4. At some point, I realized that using a compiler is rather like self-modifying code. The compiler, itself a binary data file, chews on a text file and makes a binary data file. When we run the program we just compiled, we are asking the OS to load a binary data file and leap into it. Regards Peter Miller UUCP uunet!munnari!pdact.pd.necisa.oz!pmiller /\/\* CSNET pmiller%pdact.pd.necisa.oz@au ACSnet pmiller@pdact.pd.necisa.oz Disclaimer: The views expressed here are personal and do not necessarily reflect the view of my employer or the views or my colleagues. D
jc@atcmp.nl (Jan Christiaan van Winkel) (10/12/90)
From article <829@neccan.oz>, by peter@neccan.oz (Peter Miller): > 1. On a Z80 I wrote some code which used a NMI (non-maskable interrupt). This reminds me of the code used in the 8080 basic interpreter by Microsoft. They had several entries into the errorroutine. The errorroutine expected an errornumber in register b. Now what they had done was: ld hl,<some number> ; registerpair hl gets the value <some number> ld hl,<some other number> ld hl,<some other number> and so on. The 16 bit numbers themselves were actually instructions: ld b,errorcode By jumping into the middle of one of the ld hl,... instructions, they would load the errorcode in b, and then execute some dummy ld hl,... instructions. that would not globber the value in b, eventhough the ld b,xxx instructions were just a byte away. Although this is not self modifying code, it is 'shifting the bits a bit and interpreting the result'. Very clever > 4. At some point, I realized that using a compiler is rather like > self-modifying code. The compiler, itself a binary data file, chews on a > text file and makes a binary data file. When we run the program we just > compiled, we are asking the OS to load a binary data file and leap into it. Hmmm. I think you should read Ken Thompson's Turing award lecture. He dis- cussed the possibility of getting code into a C compiler, without having it in the source. The trick is illustrated with the addition of a new escaped character like \n. In the lex. analyzer there is some sort of code like this: case '\': switch(getnewchar()) { case 'n': return '\n'; case 'a': return '\007'; /* the newly added character */ /* my name's Bond, James Bond :-) */ . . Now compile the compiler, and you'll have a new compiler that recognizes '\a'. Now edit the sourcecode to look like this: case 'a': return '\a' Tghis is possible because the compiler will be compiled with the compiler that knows about '\a'. The result is a C compiler that knows that '\a' is in reality '\007', but nowhere in the source of the C compiler that knowledge is stored. It is inherited from the previous generation of the C compiler. JC -- ___ __ ____________________________________________________________________ |/ \ Jan Christiaan van Winkel Tel: +31 80 566880 jc@atcmp.nl | AT Computing P.O. Box 1428 6501 BK Nijmegen The Netherlands __/ \__/ ____________________________________________________________________