[comp.lang.c] Self-modifying code

nather@ut-sally.UUCP (Ed Nather) (07/09/88)

In article <2972@geac.UUCP>, daveb@geac.UUCP (David Collier-Brown) writes:
> In article <429@uwovax.uwo.ca> 16012_3045@uwovax.uwo.ca (Paul Gomme) writes:
> [discussion of execute-only code segments]
> >	Besides, I thought that self-modifying code was (a) extremely difficult
> >to write, and (b) considered poor programming practice.
> 
>   Yes, it is and it is.
> 

This practice was formally named "horrid" about the time the GOTO was
banished, and for the same reasons: it's *very* hard to figure out what
is going on if either practice is exhibited unconstrained.  Yet the idea
of "self-modifying code" was one of the great motivators for encoding
instructions in the same form as data, and holding them in the same kind
of memory.  Prior to that time, "instructions" were mostly wires on a
plugboard and were hard to change in software.

It did take a long time and a lot of thought to make the GOTO really
unnecessary -- it's still hidden in C but not much used -- yet a similar
effort has not been applied to self-modifying code, and I've always
wondered why.  My guess -- and it's only a guess -- is that there has
not been much emphasis on real-time, time-critical programming techniques
in most CS departments, so the need to get maximal speed from a particular
chunk of hardware was not evident.

But it's really needed -- if you have to watch an animated display of data
on a CRT all night long that blinks and flickers, you learn very quickly
that techniques to get display refresh faster than the human flicker-frequency
can save lots of pounding headaches.  If they are "forbidden" techniques
that require a descent into assembly coding, so be it.  Headaches hurt.

Are the seeds of a new religion hidden here?

-- 
Ed Nather
Astronomy Dept, U of Texas @ Austin
{backbones}!{noao,ut-sally}!utastro!nather
nather@astro.as.utexas.edu

mcdonald@uxe.cso.uiuc.edu (07/10/88)

> [discussion of execute-only code segments]
> >	Besides, I thought that self-modifying code was (a) extremely difficult
> >to write, and (b) considered poor programming practice.
> 
>   Yes, it is and it is.
> 
    Yes, it is and yes it is, but only by those who don't need it. There
are good uses for self-modifying code: I have used it recently in
two places (one on an IBMPC and the second on a PDP-11.
     The first is the use of self-modifying code in 
interrupt handlers. It is necessary sometimes to change data segment
in such a handler. The only place to put the value for the data segment
is in the code segment, because when the interrupt occurs, only
the CS register is valid. You could quibble about whether this would
be self-modifying code if you put the value out-of-line and got it
through a pointer. This example was in assembler of course. 
    The second use is not absolutely necessary - but the results sure
are nice. This involves changing the code in graphics routines on the
fly. Absolutely nothing is more important than making write-to-screen
routines fast (well, nothing having to do with computers.) I have
an example where 4 decisions about 1 instruction each need to be
made at run time. I can do three things:
1. Use a case statement or an if to select the proper one based on
   a flag.
2. Put the little area of code where these occur in a subroutine
   and call the proper one through a pointer. Unfortunately there
   would be maybe 10 subroutines with the various choices.
3. Set up the code at run time ,which is self-modifying code.

   Possibility 1. is slow. No doubt about it. It reduces something
which should occur in the blink of an eye to a crawl. Slowness
is absolutely forbidden, so this is out.
   Possibility 3. won't be slow if a big enough chunk of code is
included around the messy part. But is is memory intensive. If the
smallest fast-enough chunk is 200 bytes, this adds up to 2000 bytes.
   Possibility 2 works fine and is fast enough, but requires
assembler intervention- doing it in C is possible but actually
harder (a lot harder). It also upsets certain religious net-persons.

   What did I actually do? On the PC, I used 10 different subroutines.
On the PDP-11, where there is only 56K memory, I used self modifying code.

   This brings up another point. A fourth alternative, possible
in Fortran but not in C, is the computed or assigned goto. The 
overhead for these is less than a subroutine call, so the amount of
code you need in a clump for it to be fast enough is small. 
Alternatively, if the complier were smart enough (mine wasn't) it
could convert a case statement into the equivalent of a computed goto.

Doug McDonald

pardo@june.cs.washington.edu (David Keppel) (07/11/88)

[ What about that self-modifying code ]
nather@astro.as.utexas.edu (Ed Nather) writes:
> "horrid"
[ hard to write, understand, maintain ]
>
>But it's really needed -- if you have to watch an animated display
>of data on a CRT all night long that blinks and flickers, you learn
>very quickly that techniques to get display refresh faster than the
>human flicker-frequency can save lots of pounding headaches.  If
>they are "forbidden" techniques that require a descent into
>assembly coding, so be it.  Headaches hurt.

Some reasons NOT to write S-M (Self Modifying or Sado-Masochistic) code:
+ It's machine dependent (we already knew that).
+ Read-only (e.g., code) pages may be shared.
+ Some machines have sperate I and D spaces.  It's not inconceivable
  that different machines in the same family will have different
  memory models.
+ Many machines use seperate instruction and data caches.  Instruction
  caches can be made much simpler (= smaller = faster) by not
  requiring them to pay attention to the data bus.  Writing to a
  memory location is not guaranteed to update the I cache, thus you
  may or may not execute the modified instruction.  Next week's model
  may have a large enough cache that this week's code breaks.

Good enough?

	;-D on  ( Compiler hints won't save you here )  Pardo

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/11/88)

In article <225800044@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
>I can do three things:

Although you didn't spell out the details of your situation,
I'm sure there are quite a few other possibilities.  For example,
one can index an array to obtain the switch flag.

As a significant example of C's efficiency for graphics programming,
virtually all the code in the Blit, DMD (5620), and MTG (630)
bitmapped terminals was written in C, and their graphics operations
are extremely fast.  No self-modifying code was necessary.

One of my "back burner" projects is to produce a display-list driven
interactive 3D viewer for these terminals (and probably port it to
Suns).  I have no doubt that it can be done quite nicely while
sticking to the C language rules.

nather@ut-sally.UUCP (Ed Nather) (07/11/88)

In article <5254@june.cs.washington.edu>, pardo@june.cs.washington.edu (David Keppel) writes:
> [ What about that self-modifying code ]
> nather@astro.as.utexas.edu (Ed Nather) writes:
> > "horrid"
> 

Tsk tsk -- out of context, like the movie ad that quotes a reviewer who said:
"This movie is a great waste of time -- it's insipid, boring, and stupid.  I've
never seen anything like it."

The ad quoted:   "...great...I've never seen anything like it."

I said the practice "...was formally declared horrid ...", NOT that *I* thought 
it was.  I don't think that.

> Some reasons NOT to write S-M (Self Modifying or Sado-Masochistic) code:
> + It's machine dependent (we already knew that).

Assembly code does tend to be machine dependent, all right -- but C was 
designed to give the programmer the same facility, but with better control and
better discipline.  It works quite well for those things it covers.  It is, in
some reasonable sense, portable.  A good compiler could generate self-modifying
code to suit, I'm sure.  What we lack is the formal discipline for using it
wisely and well.

> + Read-only (e.g., code) pages may be shared.

Only in a time-sharing system.  Most real-time control programs can't tolerate
such an environment for lots of reasons, mostly due to critical timing 
requirements.  Anyway, if generated code is treated as "executable data" this 
whole point becomes irrelevant.

> + Some machines have sperate I and D spaces.  It's not inconceivable
>   that different machines in the same family will have different
>   memory models.

Not a problem.  Generated executable code changes, just like any other variable
values, and can be treated the same way.

> + Many machines use seperate instruction and data caches.  Instruction
>   caches can be made much simpler (= smaller = faster) by not
>   requiring them to pay attention to the data bus.  Writing to a
>   memory location is not guaranteed to update the I cache, thus you
>   may or may not execute the modified instruction.  Next week's model
>   may have a large enough cache that this week's code breaks.
> 

Again, not a problem.  You don't take the generated code out of the I cache,
but out of the D cache, since it can (by definition) change.

> Good enough?

Sorry, no.  I've heard LOTS of arguments against programs that generate their
own code, and all of them -- so far as I am aware -- assume the "proper" answer
in generating arguments to show the answer is proper.  Be honest: suppose you
*had* to do it this way, but needed a formal discipline to keep it clean and
understandable, and *fast.*  Couldn't you find a reasonable way?

"It's unclean because ... well, because it's so ... so ... UNCLEAN!"

-- 
Ed Nather
Astronomy Dept, U of Texas @ Austin
{backbones}!{noao,ut-sally}!utastro!nather
nather@astro.as.utexas.edu

boyne@hplvly.HP.COM (Art Boyne) (07/11/88)

pardo@june.cs.washington.edu (David Keppel) writes:

> Many machines use seperate instruction and data caches.  Instruction
                    ^^^^^^^^

To quote a high school chemistry teacher:

'When will you guys realize that there is "a rat" in sep*arat*e?'

Art Boyne, !hplabs!hplvly!boyne
'Think, guys, think - it only hurts for a little while!'

rwhite@nusdhub.UUCP (Robert C. White Jr.) (07/12/88)

in article <225800044@uxe.cso.uiuc.edu>, mcdonald@uxe.cso.uiuc.edu says:
> Nf-ID: #R:ut-sally.UUCP:12330:uxe.cso.uiuc.edu:225800044:000:2699
> Nf-From: uxe.cso.uiuc.edu!mcdonald    Jul 10 10:13:00 1988
>     Yes, it is and yes it is, but only by those who don't need it. There
> are good uses for self-modifying code: I have used it recently in
> two places (one on an IBMPC and the second on a PDP-11.
>      The first is the use of self-modifying code in 
> interrupt handlers. It is necessary sometimes to change data segment
> in such a handler. The only place to put the value for the data segment
> is in the code segment, because when the interrupt occurs, only
> the CS register is valid. You could quibble about whether this would
> be self-modifying code if you put the value out-of-line and got it
> through a pointer. This example was in assembler of course. 

Point 1)  This is _not_ "self modifying" code as I learned it...
"self modifying code" are things (in IBMPC assem) like:

tests:	jbne	27
	je	27
	ja	27

proc	aaa	far
	movsw	tests[SI],ordin
	test	AX,BX
ordin:	je	27
	...
	endp

Where an entry condition changes the body of the code to reflect
the data, instead of writing the code to handle every inline
possibility as inline options.  This is bad practice, and nearly
impossible to follow when the substitave code sections are larger.

	

Point 2)  you should place all your data and code for an interrupt
on an IBMPC in one segment and reach it through CS anyway.  Once
there you may juggle registers to your hearts content.  To find
"local" data, you make load-address, or CS-segment-data-spaces to
store the relative information which you set up durring init, or even
as EXEC loader directives; to whit: dw CSEG; dw OFFSET init. (etc.)

lisper-bjorn@CS.YALE.EDU (Bjorn Lisper) (07/12/88)

In article <12357@ut-sally.UUCP> nather@ut-sally.UUCP (Ed Nather) writes:
>In article <5254@june.cs.washington.edu>, pardo@june.cs.washington.edu
(David Keppel) writes:
>> [ What about that self-modifying code ]
....
>> Some reasons NOT to write S-M (Self Modifying or Sado-Masochistic) code:
....
>> + Read-only (e.g., code) pages may be shared.
>
>Only in a time-sharing system.  Most real-time control programs can't tolerate
>such an environment for lots of reasons, mostly due to critical timing 
>requirements.  Anyway, if generated code is treated as "executable data" this 
>whole point becomes irrelevant.

There are time-sharing systems that can handle critical tasks (e.g.
real-time applications with strict timing requirements) well. These systems
have the possibility to give tasks priorities. A critical task can be given
a high priority to enable it to meet its time limits. Such a system may very
well have code that is shared.

Another related issue is that self-modifying code cannot be executed
directly from ROM. Executing programs in ROM is an important memory-saving
technique in small, diskless, special-purpose systems.

>> Good enough?
>
>Sorry, no.  I've heard LOTS of arguments against programs that generate their
>own code, and all of them -- so far as I am aware -- assume the "proper"
>answer in generating arguments to show the answer is proper.  Be honest:
>suppose you *had* to do it this way, but needed a formal discipline to keep
>it clean and understandable, and *fast.*  Couldn't you find a reasonable way?
>
>"It's unclean because ... well, because it's so ... so ... UNCLEAN!"

I guess there are situations with extreme performance requirements where
self-modifying code is justified. It would be interesting to see what a
semantics for self-modifying programs would look like. Regarding the
"uncleanliness" I think there are two main sources for this. The first is
the difficulty for a human mind to trace what's going on in a self-modifying
program. The second is an advantage read-only data (e.g. non-self-modifying
code) has to writable data: read-only data can be accessed asynchronously,
in any order, even concurrently, without any constraints. Therefore
read-only is "clean" in a concurrent environment. An example apart from code
is data bases, where read-only data don't have to be locked during
transactions.

Bjorn Lisper

pardo@june.cs.washington.edu (David Keppel) (07/12/88)

nather@ut-sally.UUCP (Ed Nather) writes:
>pardo@june.cs.washington.edu (David Keppel) writes:
>>[ Out of context quote ]
>[ Funny retort. ]
Sorry.  I do like your rejoinder, though.   Could be about
self-modifying code, even   :-)

> Sorry, no.  I've heard LOTS of arguments against programs that
> generate their own code, and all of them -- so far as I am aware
> -- assume the "proper" answer in generating arguments to show the
> answer is proper.  Be honest: suppose you *had* to do it this way,
> but needed a formal discipline to keep it clean and understandable,
> and *fast.*  Couldn't you find a reasonable way?

Let me rephrase my position.  Their is nothing architecturally weird
about programs that generate their own code.  Doing so does not cause
any problems on any machines that I am aware of, although few
OPERATING SYSTEMS support this.

There is a *major* problem with modifying existing code.  Consider a
machine with seperate I-cache and D-cache.  If you execute an
instruction at location 0x500, then one at 0x504 that branches back
to the one at 0x500, and the one at 0x500 modifies itself, either:

  + The cache can be built so that it pays attention to the changing
    data (code) in the I-cache.  This requires a "snoopy" cache, which
    is more complicated, less dense, and slower.  If you are
    constrained by price or die space, you may be forced to use a
    slower, smaller (worse hit-ratio) cache.
  + You can execute out of the D-cache.  This is effectively the
    same as having a snoopy I-cache.  If you have BOTH an I-cache and
    a D-cache, you're left with coherency problems that are solved by
    building a snoopy I-cache.
  + You can define this operation to be illegal, build a smaller,
    faster, non-snoopy cache, and pay the price when you need to run
    code that could be expressed "naturally" as self-modifying code.

Assume that you have an algorithm that is, say, 1000 instructions and
that each instruction takes 4 cycles to execute (must be a RISC :-).
One particular part of it, 10 instructions, is written as
self-modifying code.  It can also be written as 100 instructions of
non-self-modifying (static) code.  Further assume that the effect of
having a non-snoopy cache is to make the computer run 10% faster
(e.g., the cache is faster AND has a better hit rate).  Then the
running time of the program on the machine with the snoopy I-cache
is:

    T = 990 * 4.0 + 10 * 4.0 = 4000 cycles

On the machine without a snoopy I-cache (10% faster),

    T = 990 * 3.6 + 100 * 3.6 = 3924 cycles

which is marginally FASTER for NOT using self-modifying code,
even though it takes an additional 90 isntructions.


	;-D on  ( Cycles to burn reading flames )  Pardo

mash@mips.COM (John Mashey) (07/12/88)

In article <12357@ut-sally.UUCP> nather@ut-sally.UUCP (Ed Nather) writes:
....
>> + Many machines use seperate instruction and data caches.  Instruction
>>   caches can be made much simpler (= smaller = faster) by not
>>   requiring them to pay attention to the data bus.  Writing to a
>>   memory location is not guaranteed to update the I cache, thus you
>>   may or may not execute the modified instruction.  Next week's model
>>   may have a large enough cache that this week's code breaks.
>> 
>
>Again, not a problem.  You don't take the generated code out of the I cache,
>but out of the D cache, since it can (by definition) change.

No, machines that do this don't work that way, i.e., you only execute
code from the I-cache or main memory, not from the D-cache (or the cache
currently acting as the D-cache, in cases where you can swap them).

There is nothing wrong with generating code on the fly.
What is wrong is assuming a model of machines that looks just like
a CPU + DRAM, or with caches that act only like coherent, joint I&D caches.

This is just one instance of a general problem: how do you handle
caches that are visible to the executable code in any way?
What's needed is a good set of cache-control primitives that:
	a) Cover all of the cases
	b) Become widely used enough for people to count on.
	c) Are easy to make nops for enviornments where they don't matter.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

chasm@killer.UUCP (Charles Marslett) (07/12/88)

In article <33441@yale-celray.yale.UUCP>, lisper-bjorn@CS.YALE.EDU (Bjorn Lisper) writes:
> Another related issue is that self-modifying code cannot be executed
> directly from ROM. Executing programs in ROM is an important memory-saving
> technique in small, diskless, special-purpose systems.

Actually, I always thought that ROM was much more important as a way of
eliminating the need to load the code and as a way of guaranteeing non-
self-modifying-ness.  Usually ROM code is larger (thus memory-consuming)
than RAM-loaded code.  This is because most RAM based systems assume some
form of selfmodification of code, ranging from relocation tables that avoid
the need for self-relative addressing everywhere to file initialization code
that gets overwritten by its own buffer.  On the other hand, some RAM based
systems require 4 Mb to run an editor, so you may be right in those cases
:->} !!

Seriously, newer architectures have reduced the difference (to 0 perhaps),
but the emphasis on RISC these days may resurrect self-modifying code --
a RISC-er technique is not known to mankind!  (:-D}===<<).

> Bjorn Lisper


Charles Marslett

hjm@cernvax.UUCP (hjm) (07/12/88)

If the program is considered to be data for an interpreter which I will for
convenience call a CPU, then does self-modifying code seem any different?  As
long as the data in this table remains consistent, then there should be no
difficulty in dealing with it mathematically.  All of the problems of database
consistency have been tackled (but not all solved) some time ago.  Code is data
according to a PATCH program, and the only tricky bits there are to ensure that
any changes fit inside the space that they replace and to patch up the jumps
where necessary.  A bit like a structure linked by pointers, such as a binary
tree, except that code is in general a directed cyclic graph (the directedness
is due to the program counter).

Now, if the routines that ensure the consistency of the code change themselves,
then the world will most likely fall down around one's head, but then things get
to be so recursive that my stack oveflows ...

	Hubert Matthews

nather@ut-sally.UUCP (07/12/88)

In article <33441@yale-celray.yale.UUCP>, lisper-bjorn@CS.YALE.EDU (Bjorn Lisper) writes:
> 
> I guess there are situations with extreme performance requirements where
> self-modifying code is justified. It would be interesting to see what a
> semantics for self-modifying programs would look like. 

I agree.  That was the whole point of my original posting -- to see if someone
with a talent for languages could ignore the current prejudice against this
practice, and see what might be done.  If there's really no way to do it in a
reasonable manner, fine -- but just *saying* it's awful doesn't *make* it awful.

> The first [source of uncleanliness] is
> the difficulty for a human mind to trace what's going on in a self-modifying
> program. 

I think this arises because of the two different intellectual "levels" involved:
what the program is doing which generates the instructions (by analogy with the
instruction generator in a compiler) and what those instructions are going to
do when they are executed.  These *must* be kept completely separate or massive
confusion results.  But if the written code is approached with full awareness
of this requirement, the confusion vanishes -- in my experience, anyway.

> The second is an advantage read-only data (e.g. non-self-modifying
> code) has to writable data: read-only data can be accessed asynchronously,
> in any order, even concurrently, without any constraints. Therefore
> read-only is "clean" in a concurrent environment. 

I believe self-modifying code was condemned long before concurrent environments
were even possible.  This may be a disadvantage, but *no* techinque exists
without trade-offs.  I think we can live with this one.

> An example apart from code
> is data bases, where read-only data don't have to be locked during
> transactions.

OK, but I'm not sure this is relevant here.  I do, however, appreciate your
keeping "data" plural.  It happens so rarely nowdays ...


-- 
Ed Nather
Astronomy Dept, U of Texas @ Austin
{backbones}!{noao,ut-sally}!utastro!nather
nather@astro.as.utexas.edu

dg@lakart.UUCP (David Goodenough) (07/13/88)

From article <225800044@uxe.cso.uiuc.edu>, by mcdonald@uxe.cso.uiuc.edu:
> 
>> [discussion of execute-only code segments]
>> >	Besides, I thought that self-modifying code was (a) extremely difficult
>> >to write, and (b) considered poor programming practice.
>> 
>>   Yes, it is and it is.
>> 
>     Yes, it is and yes it is, but only by those who don't need it. There
> are good uses for self-modifying code: I have used it recently in
> two places (one on an IBMPC and the second on a PDP-11.

These are MY OPINIONS ONLY - you are free to agree and disagree and flame as
you see fit. I have used S-M-C only once when doing a section of code that
handled single stepping. The problem W/ the Z80 (comp.lang.c ??????) is that
it has conditional jumps, calls AND returns. Now I go and fetch an instruction
out of the code portion (i.e. where my PC is pointing to). It's 0xc2. Aha, I
have a conditional instruction. Now to figure out whether the condition is
met I have two choices:

	1. Decode the bits that determine which flag is being looked at, and
		whether the flag should be set or reset. Get the flags into
		some register where I can use them. Mask out the flag in
		question. Do a condional jump on the result of the mask
		and whether the flag shold be set or clear.

	2. Turn the instruction into a conditional jump (and with a mask
		then or with a mask - turns any condional (except the
		relative jumps) into a conditional jump). Drop the condional
		jump into a hole - get the flags and do the jump.

If someone wants to see the code that I produced for both of the above I can
e-mail. I ask you to take it on faith that 1 was about 40-50 lines, whereas
2 was 6 lines. Also BECAUSE I COMMENTED, it was possible to figure out what
was going on. My mark of good commenting is code that can be read a year later
and still understood. I agree that this is not for the faint at heart, but
it was faster, and to my mind easier to understand. Note also that in this
application speed was a premium: when I'm single stepping 1000 instructions
I want things to happen PDQ as I would like to see the program appear to run
as fast as possible. But then I'm a little insane, because who in their
right mind writes a single step utility for a dead micro like the Z80 :-)

Like everything  it has it's place: and is not to be misused. Misuse of
S-M-C *_IS_* a sin (well I think so), but where it is justified I will
use it.
-- 
	dg@lakart.UUCP - David Goodenough		+---+
							| +-+-+
	....... !harvard!cca!lakart!dg			+-+-+ |
						  	  +---+

baum@Apple.COM (Allen J. Baum) (07/13/88)

[]
>In article <5262@june.cs.washington.edu> pardo@uw-june.UUCP (David Keppel) writes:
>
>There is a *major* problem with modifying existing code.  Consider a
>machine with seperate I-cache and D-cache.  If you execute an
>instruction at location 0x500, then one at 0x504 that branches back
>to the one at 0x500, and the one at 0x500 modifies itself, either:
>
>  + The cache can be built so that it pays attention to the changing
>    data (code) in the I-cache....
>  + You can execute out of the D-cache....
>  + You can define this operation to be illegal....

or you can recognize the situation and issue a write-back-Dcache-line
instruction, and a flush-Icache-line instruction, and everything works.
Note that a compile-and-go situation is exactly this; you are building
a program in the Dcache, and then you want to execute it. While it may
be unlikely that any remnants are left over after linking, et. al., this
may be the only way to guarantee it.

--
{decwrl,hplabs,ihnp4}!nsc!apple!baum		(408)973-3385

lisper-bjorn@CS.YALE.EDU (Bjorn Lisper) (07/13/88)

In article <4776@killer.UUCP> chasm@killer.UUCP (Charles Marslett) writes:
>In article <33441@yale-celray.yale.UUCP>, I write:
>> Another related issue is that self-modifying code cannot be executed
>> directly from ROM. Executing programs in ROM is an important memory-saving
>> technique in small, diskless, special-purpose systems.
>
>Actually, I always thought that ROM was much more important as a way of
>eliminating the need to load the code and as a way of guaranteeing non-
>self-modifying-ness.  Usually ROM code is larger (thus memory-consuming)
>than RAM-loaded code.  This is because most RAM based systems assume some
>form of selfmodification of code, ranging from relocation tables that avoid
>the need for self-relative addressing everywhere to file initialization code
>that gets overwritten by its own buffer....

As I wrote was actually thinking primarily of diskless systems. In such a
system the program is stored in ROM. If the program is self-modifying, its
code has to be loaded into RAM before it can execute. If not it can be
executed directly in ROM and only the data areas need to be RAM. Thus memory
is saved (we don't need the extra RAM to hold the code).

Bjorn Lisper

hjm@cernvax.UUCP (hjm) (07/13/88)

I have been mulling over the idea of self-modifying code (SMC) for a while and
I've come to the conclusion that there is no good definition of SMC.

For example, if treating code as data is the definition, then does passing a
procedure as a parameter in PASCAL, or a pointer to a function in C count?
Probably not.

OK, what about a jump table.  Consider an array of pointers to functions in C.
Does changing the pointers count as SMC?  Again, I don't think so.

So, changing a pointer by assigning to it is not SMC, but putting a new jump
instruction in (e.g. jmp #somewhere_else) in place of an existing instruction
is SMC.  Does one level of indirection really make that much difference?

Of course, if you want to be really horrid in C, you can try something like
this:

char codearray[] = { 0x03, /* one byte instruction for something */
		     0xc9  /* one byte return instruction */
		   }

and then 'call' this function using a cast to turn the pointer codearray into
a pointer to a function.  (Who needs #asm anyway!)  Then you can modify the
code as much as you want.  This _is_ SMC without a doubt, because you can
overwrite code.  So, I propose a very weak definition for SMC as code that
writes over other code.

As a final note, why is it 'clean' to alter a jump table and 'unclean' to alter
an inline constant (e.g. jmp @offset(r0) uses a value in memory as the address
but mov (pc)+,#1234 which loads an immediate does so too)?  Why the subtle
difference?  Any thoughts on the subject?

	Hubert Matthews

(I don't consider LISP or PROLOG programs that create code on the fly to be
SMC.  Does anyone disagree?)

krohn@u1100a.UUCP (Eric Krohn) (07/13/88)

In article <8239@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
] As a significant example of C's efficiency for graphics programming,
] virtually all the code in the Blit, DMD (5620), and MTG (630)
] bitmapped terminals was written in C, and their graphics operations
] are extremely fast.  No self-modifying code was necessary.

Virtually all?  Yes.  The most critical?  No.
The bottom-most level of all screen updates is the infamous
bitblt function.  A 1982 BTL memo by John Reiser tells about the on-the-fly
code generation done by bitblt to achieve rather good performance on the
original Blit.  I've assumed that similar code was carried forward to the 630
and the 5620 (even with the CPU switch).
My 630's host is down for hardware upgrade now, so I can't disassemble the
firmware to see what code is really there.
-- 
--
Eric J. Krohn
krohn@ctt.ctt.bellcore.com  or  {bcr,bellcore}!u1100a!krohn
Bell Communications Research,	444 Hoes Ln,    Piscataway, NJ 08854

hankd@pur-ee.UUCP (Hank Dietz) (07/14/88)

In article <12360@ut-sally.UUCP>, nather@ut-sally.UUCP (Ed Nather) writes:
> In article <33441@yale-celray.yale.UUCP>, lisper-bjorn@CS.YALE.EDU (Bjorn Lisper) writes:
> > 
> > I guess there are situations with extreme performance requirements where
> > self-modifying code is justified. It would be interesting to see what a
> > semantics for self-modifying programs would look like. 
> 
> I agree.  That was the whole point of my original posting -- to see if someone
> with a talent for languages could ignore the current prejudice against this
> practice, and see what might be done.  If there's really no way to do it in a
> reasonable manner, fine -- but just *saying* it's awful doesn't *make* it awful.

The key problem with self-modifying code is that code created at runtime may
REPLACE code that appeared at "compile" time, hence, in general, no STATIC
understanding of the code is possible.  This can make it hard for humans to
understand the code, but more importantly it implies that compilers cannot
perform optimizations, e.g., you cannot perform register allocation / value
propagation / common subexpression elimination / in-line expansion because
the code you remove might be code which is modified at runtime.

The fix need not be the elimination of the generate-and-execute programming
concept, rather, the language merely needs to specify constraints on the code
which is to be generated at runtime.  For example, key constraints would be
that the runtime generated code could not redefine any existing function,
could only be invoked through a statically-specified interface, and could only
access data which either was statically-declared as accessible through that
interface or was created as new data local to the new code.  In other words,
we can permit code to add to itself, but not to redefine what existed at
compile time:  self-extending code.

An outline of fixes to be applied to Lisp (to create "Refined Lisp") appears
in my PhD thesis:  H. Dietz, "The Refined-Language Approach to Compiling for
Parallel Supercomputers," June 1987.  The folks working on Scheme had similar
ideas....

						-hankd

radford@calgary.UUCP (Radford Neal) (07/14/88)

There are interesting cases where on-the-fly generation of code seems
to be essential to get good asymptotic space and/or time complexity.

Consider the following code fragment (version A):

     for (i = 0; i<L; i++)
     { if (c1) p1();
       if (c2) p2();
       ...
       if (cN) pN();
     }

The Boolean variables c1, c2, ... cN are assumed to be loop invariants.
Assuming the pi all take the same constant time and code space, the
time complexity of this code is O(N), and its space complexity is also
O(N). (I'm ignoring the factor of L for number of loop iterations.)

Now consider the following equivalent code, specialized for N=2, but
clearly generalizable to any N (version B):

     if (c1)
     { if (c2) for (i = 0; i<L; i++) { p1(); p2(); }
       else    for (i = 0; i<L; i++) { p1(); }
     }
     else
     { if (c2) for (i = 0; i<L; i++) { p2(); }
       else    /* Nothing */ ;
     }

Code like this will have a time complexity of O(M), where M is the
number of ci that are actually true, and a space complexity of O(2^N).

Let's say that M, the number of true ci, is typically logN. This gives
the following comparison for versions A and B:

     version  space   time

        A     O(N)    O(N)
        B     O(2^N)  O(logN)

Thus you get to choose between time that is exponentially worse than
it might be, or space that is exponentially worse than it might be.

The following lets you get both (version C):

     start generating code;
     if (c1) generate instruction to call p1;
     if (c2) generate instruction to call p2;
     ...
     if (cN) generate instruction to call pN;
     for (i = 0; i<L; i++) execute generated instructions;

This gives you the low time complexity of version B, with the low
space complexity of version A. It's machine dependent, however.

It _is_ possible to get the same form of time complexity without
machine dependent operations, as follows (version D):

     void (*pa)[N+1];
     j = 0;
     if (c1) pa[j++] = p1;
     if (c2) pa[j++] = p2;
     ...
     if (cN) pa[j++] = pN;
     pa[j] = 0;
     for (i = 0; i<L; i++)
     { for (j = 0; pa[j]; j++) (*pa[j])();
     }

This essentially works by generating code in a very specialized
language, and then interpreting that code. This gives the same
form of time complexity as version B, but with a larger constant
factor - perhaps considerably larger if the pi aren't really procedures,
but just short code sequences.

This is the general sort of reason for the bit-blit algorithms that
generate code on the fly, and I've encountered the same sort of 
situation in other contexts (e.g. a "life" program). One could 
imagine a compiler performing the transformation from version A
to the machine-dependent version C, but I've never heard of such.
Even if it did, it is a bit disturbing that a transformation of
such significance for space/time complexity isn't expressed at the 
source level. This would be solved by having the compiler transform
version D to the faster version C, but that's probably even harder
for the compiler-writer to accomplish than the A to C transformation.

    Radford Neal

peter@ficc.UUCP (Peter da Silva) (07/14/88)

I have a question:

	Why are an Icache plus a Dcache better than just
	a big shared cache as big as both?
-- 
Peter da Silva  `-_-'  Ferranti International Controls Corporation.
"Have you hugged  U  your wolf today?" (uunet,tness1)!sugar!ficc!peter.

nather@ut-sally.UUCP (Ed Nather) (07/14/88)

In article <1100@nusdhub.UUCP>, rwhite@nusdhub.UUCP (Robert C. White Jr.) writes:
> "self modifying code" are things (in IBMPC assem) like:
> 
[ code omitted]
> 
> Where an entry condition changes the body of the code to reflect
> the data, instead of writing the code to handle every inline
> possibility as inline options.  This is bad practice, and nearly
> impossible to follow when the substitave code sections are larger.
  
Large programs written in assembler are hard to follow under any
conditions.  If the code is carefully documented, with the *intent*
of the code clearly spelled out, I don't think this is harder to
follow than any other technique, and I don't agree that it is "bad
practice."  If it keeps a volatile display from blinking, which it
can because it can be much faster that branch-and-test-condition code,
then I would label it as *good* practice.

It would be even better if it could be done in a HLL like, say, C --
with dangerous and confusing possibilities sharply restricted by the
language itself so the resulting code can be readily understood.  That 
was the idea behind "structured programming," back when it was a religion.

-- 
Ed Nather
Astronomy Dept, U of Texas @ Austin
{backbones}!{noao,ut-sally}!utastro!nather
nather@astro.as.utexas.edu

jackson@esosun.UUCP (Jerry Jackson) (07/15/88)

In article <752@cernvax.UUCP> hjm@cernvax.UUCP (hjm) writes:

   Path: esosun!seismo!uunet!mcvax!cernvax!hjm
   From: hjm@cernvax.UUCP (hjm)
   Newsgroups: comp.lang.c,comp.arch
   Summary: What is self-modifying code anyway?
   Keywords: self-modifying code
   Date: 13 Jul 88 09:44:20 GMT
   References: <3353@cognos.UUCP> <619@goofy.megatest.UUCP> <429@uwovax.uwo.ca> <5254@june.cs.washington.edu> <12357@ut-sally.UUCP> <5262@june.cs.washington.edu>
   Reply-To: hjm@cernvax.UUCP ()
   Organization: CERN European Laboratory for Particle Physics, CH-1211 Geneva, Switzerland
   Lines: 36
   Xref: esosun comp.lang.c:11534 comp.arch:5582

>>   As a final note, why is it 'clean' to alter a jump table and 'unclean' to alter
>>   an inline constant (e.g. jmp @offset(r0) uses a value in memory as the address
>>   but mov (pc)+,#1234 which loads an immediate does so too)?  Why the subtle
>>   difference?  Any thoughts on the subject?

	   Hubert Matthews

   (I don't consider LISP or PROLOG programs that create code on the fly to be
   SMC.  Does anyone disagree?)


The main difference that I see is that the code in one case is not
reentrant.  In systems like unix where it is possible for more than
one process to share a code segment at runtime, the jump table is
local to each process while the code is not.  Imagine if vi had true
self-modifying code in it...  (Many users typically share a code
segment for programs like vi and csh) The example in which you create
an array and then call it as a function does not really fit this
definition of self-modifying, since the code that is changed is
created in data space at runtime and hence cannot interfere with other
programs using the same code segment.

*Real* SMC would be something like:

main()
{
	char func();
	long *longptr;

	longptr = (long *)func;
	*(longptr + 20) = 0xffffffff;  /* shudder!!!!! */
}

+-----------------------------------------------------------------------------+
|   Jerry Jackson                       UUCP:  seismo!esosun!jackson          |
|   Geophysics Division, MS/22          ARPA:  esosun!jackson@seismo.css.gov  |
|   SAIC                                SOUND: (619)458-4924                  |
|   10210 Campus Point Drive                                                  |
|   San Diego, CA  92121                                                      |
+-----------------------------------------------------------------------------+

barmar@think.COM (Barry Margolin) (07/15/88)

In article <749@cernvax.UUCP> hjm@cernvax.UUCP () writes:
>If the program is considered to be data for an interpreter which I will for
>convenience call a CPU, then does self-modifying code seem any different?  As
>long as the data in this table remains consistent, then there should be no
>difficulty in dealing with it mathematically.  All of the problems of database
>consistency have been tackled (but not all solved) some time ago.  Code is data
>according to a PATCH program, and the only tricky bits there are to ensure that
>any changes fit inside the space that they replace and to patch up the jumps
>where necessary.  A bit like a structure linked by pointers, such as a binary
>tree, except that code is in general a directed cyclic graph (the directedness
>is due to the program counter).

This view is valid, but simplifies the issues.  If you are going to
think of the program as a database, you must actually consider it as a
distributed database.  This is because the data is replicated in
several places: the physical memory, the swapping device, the CPU
caches, and internal processor registers.  The problems of maintaining
consistency in replicated databases is much less understood than
concurrency control in non-distributed databases.  Additionally, any
form of consistency maintenance must incur some performance penalty,
and hardware designers are under great pressure to make machines go as
fast as possible.  Multiple caches that do not maintain consistency
are one answer to this.  Since self-modifying code has generally been
considered bad style for quite some time, they felt comfortable
implementing a hardware optimization that assumes code will not modify
nearby locations.

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

bclaus@wright.EDU (Brian Clausing) (07/15/88)

in article <5262@june.cs.washington.edu>, pardo@june.cs.washington.edu (David Keppel) says:
> Xref: wright comp.lang.c:8563 comp.arch:4081
[...discussion about self-modifying code.  It fouls up caches.
It is intellectually unmanageable with current ISPs....]

As to the semantics of self-modifying code, isn't it identical to
a Turing machine's or am I missing something?  Of course, reduction
machines, like Turner's combinatoric implementation of SASL, are
inherently self modifying.
MB Clausing

jfh@rpp386.UUCP (John F. Haugh II) (07/15/88)

In article <3950008@hplvly.HP.COM> boyne@hplvly.HP.COM (Art Boyne) writes:
>pardo@june.cs.washington.edu (David Keppel) writes:
>
>> Many machines use seperate instruction and data caches.  Instruction
>                    ^^^^^^^^
>To quote a high school chemistry teacher:
>
>'When will you guys realize that there is "a rat" in sep*arat*e?'

to paraphrase andrew jackson, president of the united states, and face on
the twenty dollar bill,

	how dull the mind which can only think of one
	spelling for a word.

- john.
-- 
John F. Haugh II                 +--------- Cute Chocolate Quote ---------
HASA, "S" Division               | "USENET should not be confused with
UUCP:   killer!rpp386!jfh        |  something that matters, like CHOCOLATE"
DOMAIN: jfh@rpp386.uucp          |             -- with my apologizes

baum@Apple.COM (Allen J. Baum) (07/15/88)

[]
>In article <1087@ficc.UUCP> peter@ficc.UUCP (Peter da Silva) writes:
>I have a question:
>
>	Why are an Icache plus a Dcache better than just
>	a big shared cache as big as both?

In terms of hit/miss ratios, a unified cache is clearly better. However, in
terms of effective access time, a split I/D cache is better. This is because
both can be accessed simultaneously, instead of one having to wait for the
other to finish. If both are getting accessed simultaneously (which should
happen 40% of the time, if Loads/Stores account for %40 of instructions), then
this more than offsets the increase in miss ratio.

--
{decwrl,hplabs,ihnp4}!nsc!apple!baum		(408)973-3385

tim@amdcad.AMD.COM (Tim Olson) (07/16/88)

In article <1087@ficc.UUCP> peter@ficc.UUCP (Peter da Silva) writes:
| I have a question:
| 
| 	Why are an Icache plus a Dcache better than just
| 	a big shared cache as big as both?

When the processor has separate instruction and data buses, so
concurrent accesses can occur to both caches.

-- 
	-- Tim Olson
	Advanced Micro Devices
	(tim@delirun.amd.com)

lisper-bjorn@CS.YALE.EDU (Bjorn Lisper) (07/16/88)

In article <1744@vaxb.calgary.UUCP> radford@calgary.UUCP (Radford Neal) writes:
>
>There are interesting cases where on-the-fly generation of code seems
>to be essential to get good asymptotic space and/or time complexity.
>
>Consider the following code fragment (version A):
>
>     for (i = 0; i<L; i++)
>     { if (c1) p1();
>       if (c2) p2();
>       ...
>       if (cN) pN();
>     }
>
>The Boolean variables c1, c2, ... cN are assumed to be loop invariants.

(...version B deleted...)

>The following lets you get both (version C):
>
>     start generating code;
>     if (c1) generate instruction to call p1;
>     if (c2) generate instruction to call p2;
>     ...
>     if (cN) generate instruction to call pN;
>     for (i = 0; i<L; i++) execute generated instructions;

Interesting. This reminds a lot of the kind of compile-time manipulation
that optimizing compilers do. The loop

     for (i = 0; i<L; i++) execute generated instructions;

is an optimized version of A *when the ci are known*. If they were known at
compile-time an optimizing compiler could have generated the same code. The
only difference is that the self-modifying program does this *at runtime*,
when we are guaranteed to know the values of the ci's. Maybe we should call
this technique "run-time compilation"?

Bjorn Lisper

smryan@garth.UUCP (Steven Ryan) (07/16/88)

>'When will you guys realize that there is "a rat" in sep*arat*e?'

Why do you use *realism* and *realist* but then switch to *realize*?

                             s m ryan

-------------------------------------------
Self-descriptive sentences are irritating.

pardo@june.cs.washington.edu (David Keppel) (07/16/88)

In article <1087@ficc.UUCP> peter@ficc.UUCP (Peter da Silva) writes:
>[ Why are two caches (I, D) better than one (I+D)?

For a given real-estate, a non-snoopy I-cache will hold more data
bits and be faster than a snoopy I-cache.

+ For a given real-estate, the I-cache hit rate will be better.  This
  makes the average instruction fetch time lower.  This may be a win
  if instruction fetch rate is at all a bottleneck.

+ For a given cache size, the real-estate can be smaller.  This means:
 + The cache may be cheaper.
 + The cache may be small-enough to put closer to the instruction
   register, making it effectively faster.

+ The logic is simpler and more regular.
  + Faster design times.
  + Fewer bugs.
  + Easier to incorporate into VLSI designs (==faster).
  + Less logic to drive => faster, less power.

+ The cache doesn't need to be connected to as many things.
  + More freedom to place the cache => faster, cheaper.
  + Less external logic to drive => smaller, faster, cheaper.

+ Aliasing and data distribution are less of (none) a problem.
  This lets you build (with much less pain)
  + Heirarchical cache organizations (faster).
  + Virtually-addressed caches (faster).

I-caches are particularly useful on machines that make heavy use
of memory.  This includes:
+ Many high-performance processors.
+ Shared-memory multiprocessors.

The primary problem (that I'm aware of) with non-snoopy I-caches
is that you must manually flush the I-cache every time that an
instruction is changed.  (Strictly speaking, every time you change an
I-space address that has been touched since the last cold start).

Does that answer your question ?-)

	;-D on  ( Faster )  Pardo

peter@athena.mit.edu (Peter J Desnoyers) (07/16/88)

In article <12382@ut-sally.UUCP> nather@ut-sally.UUCP (Ed Nather) writes:
 [talking about properly written self-modifying code]
>
>It would be even better if it could be done in a HLL like, say, C --
>with dangerous and confusing possibilities sharply restricted by the
>language itself so the resulting code can be readily understood.  

The example that started this whole conversation was partial
application. That IS "normally" incorporated into a language, although
usually obscure, esoteric ones. Or else in Lisp, where it is partly
supported as closures. The example faked partial application in C,
which, like recursion in FORTRAN, is doable (on some architectures)
but ugly.

The necessity of using self-modifying code to implement partial
application does not make it bad programming practice, any more than
the necessity of using ML gotos to implement 'if' makes if-then
statements bad programming practice. However, if you can't isolate the
grungy part in the compiler (preferably) or a system call, then any
advantages of this programming paradigm may be lost in the complexity
(and danger) of using it.

				Peter Desnoyers
				peter@athena.mit.edu
	

petolino%joe@Sun.COM (Joe Petolino) (07/16/88)

>>	Why are an Icache plus a Dcache better than just
>>	a big shared cache as big as both?
>
>In terms of hit/miss ratios, a unified cache is clearly better.

I beg to differ.  We ran a few simulations, comparing split and unified
caches (total cache size in the 32K-512K range), and when the caches were
direct-mapped (i.e. 1-way set-associative) the unified cache performed worse. 
Our guess is that when code and data are forced to co-exist in the same
space, you get a high probability of collision and thrashing.  This effect
went away when we increased the degree of set associativity.  One way to
think about it is that having two (direct-mapped) caches gives you some of
the benefits of set-associativity.  Take this with a grain of NaCl: we only
tried a few test programs, and each was small enough to fit into the cache.

-Joe

fst@mcgp1.UUCP (Skip Tavakkolian) (07/16/88)

In March/April 1987 issue of AT&T TECHNICAL JOURNAL (volume 66, issue 2) there
is an article about ``PICO - A PICTURE EDITOR'' by Gerard J. Holzmann. Its
description of PICO is ``an interactive editor for digitized graphic images''.
The interesting point about the VAX-750 version of this editor is that
it ``contains an optimizing compiler that translates the expressions
"on-the-fly" (interactively) into efficient code for the VAX-750 computer''.
Author acknowledged Ken Thompson as the person who wrote the ``on-the-fly''
optimizing compiler.  Is this considered self-modifying?  If so, could
one of the elders (a term of endearment) please shed light on this subject?

Sincerely
-- 
Fariborz ``Skip'' Tavakkolian

UUCP	...!uw-beaver!tikal!mcgp1!fst

tps@chem.ucsd.edu (Tom Stockfisch) (07/16/88)

In article <33652@yale-celray.yale.UUCP> lisper-bjorn@CS.YALE.EDU (Bjorn Lisper) writes:
>In article <1744@vaxb.calgary.UUCP> radford@calgary.UUCP (Radford Neal) writes:
>>There are interesting cases where on-the-fly generation of code seems
>>to be essential to get good asymptotic space and/or time complexity.
>>
>>     for (i = 0; i<L; i++)
>>     { if (c1) p1();
>>       if (c2) p2();
>>       ...
>>       if (cN) pN();
>>     }
>>[SMC solution follows]
>>     start generating code;
>>     if (c1) generate instruction to call p1;
>>     if (c2) generate instruction to call p2;
>>     ...
>>     if (cN) generate instruction to call pN;
>>     for (i = 0; i<L; i++) execute generated instructions;

Wouldn't the following be "nearly" as efficient, and be portable to boot?
At least it has the same asymptotic complexity.

	enum { END, P1, P2, ..., Pn }	code[MAXCODE], *p;

	p = code;
	if (c1)	*p++ =	P1;
	if (c2)	*p++ =	P2;
	...
	if (cn)	*p++ =	Pn;
	*p =	END;

	for ( pixel = 0; pixel++ < NPIXELS )
		for ( p = code; *p != END; )
			switch (*p++)
			{
			case	P1:
				[p1 instructions]
				break;
			case	P2:
				[p2 instructions]
				break;
			...
			case	Pn:
				[pn instructions]
				break;
			default:
				assert(0);
				/*NOTREACHED*/
			}
-- 

|| Tom Stockfisch, UCSD Chemistry	tps@chem.ucsd.edu

sean@lfcs.ed.ac.uk (Sean Matthews) (07/16/88)

With this talk about self modifying code, no one has mentioned the
theoretical work that has been done on the subject.

There is a quite well known paper (and quite controversial) on the
subject of programs that can modify themselves and their interpreters
by Brian Smith

Reflection and Semantics in a procedural language
MIT-LCS-TR-272 Mass. Inst. Tech.
January 1982

Reflection and semantics in Lisp
11th ACM symposium on principles of programming languages

Then there is the reply to these papers by
Mitchel Wand and Daniel Freidman

The mystery of the tower revealed:
a non-reflective description of the reflective tower
CACM 1986
(this is as far as I can go since I have a copy of a copy and there
is no publishing information on it)

An extended version of this paper was published in
Meta-level architectures and reflection,
P. Maes and D. Nardi (editors)
Elsevier Science Publishers B.V. (North-Holland) 1988

There is a lot of theory about self modifying systems and
self referential systems but by the time you start looking into it
you are in philosophical logic, not comp.architecture

Se\'an Matthews
arpa: sean%uk.ac.ed.aipna@nss.cs.ucl.ac.uk

mash@mips.COM (John Mashey) (07/16/88)

In article <1087@ficc.UUCP> peter@ficc.UUCP (Peter da Silva) writes:

>	Why are an Icache plus a Dcache better than just
>	a big shared cache as big as both?

1) (Major) with split caches, you can access a word of instruction and
a word of data in the same cycle.  With a joint cache, you get one or
the other.  As a truly gross estimate, expect to lose 10-40% of your
performance, given conflicts between I-refs and loads/stores.

2) Less important, but still useful.  As you make a direct-mapped cache
bigger, each successive 2X improves the hit-rate, but it improves it less than
the last 2X.  At some point, it is hard to make it bigger and keep the same
speed SRAMs, and then the only way (keeping the same organization) to make
the hit rate faster is to make it set-associative.  Given the same total
cache size, hit rates for many programs (not all, there are exceptions):
	joint, direct-mapped <=
	split, direct-mapped <=
	joint, 2-way set associative
Note: those are Hit Rates, not performance.  There are a bunch of reasons
to keep caches (at least the ones nearest the CPU) direct-mapped when
you can.

The following is a fine paper that analyzes the various tradeoffs in
cache design [rather than just hit rates], is:

	S. Pryzbylski, M. Horowitz, J. Hennessy, "Performance Tradeoffs
	in Cache Design", Proc. 15th Annual Symposium on Computer
	Architecture, May 30 - June 2, 1988, Honolulu, Hawaii.
	(Computer Architecture News 16, 2(May 1988), 290-298.
	
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

bill@proxftl.UUCP (T. William Wells) (07/16/88)

In article <60175@sun.uucp>, petolino%joe@Sun.COM (Joe Petolino) writes:
> >>    Why are an Icache plus a Dcache better than just
> >>    a big shared cache as big as both?
> >
> >In terms of hit/miss ratios, a unified cache is clearly better.
>
> I beg to differ...
>                                  ...Take this with a grain of NaCl: we only
> tried a few test programs, and each was small enough to fit into the cache.

Another relevant point about separate caches is this:
instructions and data have different access patterns.  A cache
designed for the one is not necessarily going to be right for the
other; a cache designed for both may not do as well as the
separate ones.  So this possibility has to be balanced against
the general benefit of having a larger pool to work with.

sl@van-bc.UUCP (pri=-10 Stuart Lynne) (07/17/88)

In article <752@cernvax.UUCP> hjm@cernvax.UUCP () writes:
>I have been mulling over the idea of self-modifying code (SMC) for a while and
>I've come to the conclusion that there is no good definition of SMC.

>For example, if treating code as data is the definition, then does passing a
>procedure as a parameter in PASCAL, or a pointer to a function in C count?
>Probably not.

>OK, what about a jump table.  Consider an array of pointers to functions in C.
>Does changing the pointers count as SMC?  Again, I don't think so.

>So, changing a pointer by assigning to it is not SMC, but putting a new jump
>instruction in (e.g. jmp #somewhere_else) in place of an existing instruction
>is SMC.  Does one level of indirection really make that much difference?


A simple definition might be any program that cannot be compiled and run
in shared text mode (or equivalent for non Unix application environments).

Modifying a jump table in your data space does not affect how the program
will run for other users. 

Modifying a jump instruction in the shared text *will* affect how the
program will run for other users.


I have used SMC in places that where by design not required to be shared and
where a high degree of efficency was required. For example in a p-System
type interpreter you need to have a test in the opcode fetch and dispatch
loop for a pseudo interrupt. This takes a lot of time. By simply patching in
a jmp to_int as the first instruction of the loop we eliminate the need for
an explicit test. 

To make the above example work for multi-user systems, we used a jump
indirect through a var which contained either the ifetch address or
interrupt handler address. A little slower but still faster than explicit
test in the ifetch loop.

Similiar type of problems can arise when doing high performance device
drivers. 


-- 
Stuart.Lynne@wimsey.bc.ca {ubc-cs,uunet}!van-bc!sl     Vancouver,BC,604-937-7532

leo@philmds.UUCP (Leo de Wit) (07/17/88)

In article <752@cernvax.UUCP> hjm@cernvax.UUCP () writes:
>I have been mulling over the idea of self-modifying code (SMC) for a while and
>I've come to the conclusion that there is no good definition of SMC.
>
>For example, if treating code as data is the definition, then does passing a
>procedure as a parameter in PASCAL, or a pointer to a function in C count?
>Probably not.

True. A pointer is data.

>OK, what about a jump table.  Consider an array of pointers to functions in C.
>Does changing the pointers count as SMC?  Again, I don't think so.

Again true. Note that it may be possible that the system (O.S.,
processor) checks whether the new pointer value represents a valid
address, or a valid entry point. If this is (always) desirable is an
entirely different question. E.g. some architectures will protest if
you try to jump to an odd address. PASCAL will leave you probably less
room to cheat than C.

>So, changing a pointer by assigning to it is not SMC, but putting a new jump
>instruction in (e.g. jmp #somewhere_else) in place of an existing instruction
>is SMC.  Does one level of indirection really make that much difference?

Yes, I think it does make a difference. Maybe not always, but there are
cases that you can't get away with this: think of re-entrant code, or
shared text segments. Now I'm thinking of recursion, what about putting
the code on the stack 8-) ? No worry about re-entrancy, and the C
program model becomes more complete (we have already global, static and
automatic data, and global and static functions, and now there's
automatic functions...).

>Of course, if you want to be really horrid in C, you can try something like
>this:
>
>char codearray[] = { 0x03, /* one byte instruction for something */
>		     0xc9  /* one byte return instruction */
>		   }

This must be (or was a) Z80 freak! At least I recognize the 0xc9 as:
RET.  On a Z80 you could do other horrible things too, since
instruction sizes vary from 1-4 bytes; by carefully picking your
instructions you could use the same code twice (using a one-off entry
point), with entirely different result. Good (?) old days, when memory
was more expensive than programming effort....

>and then 'call' this function using a cast to turn the pointer codearray into
>a pointer to a function.  (Who needs #asm anyway!)  Then you can modify the
>code as much as you want.  This _is_ SMC without a doubt, because you can
>overwrite code.  So, I propose a very weak definition for SMC as code that
>writes over other code.

Note that not all implementations will allow initialized arrays to be
altered.  I remember a problem we had last year on VMS while passing a
initialized char array to mktemp() (or whatever it's name is); the
program stackdumped because mktemp tried to write into the readonly
array (yes, VMS put it in readonly!).

>As a final note, why is it 'clean' to alter a jump table and 'unclean' to alter
>an inline constant (e.g. jmp @offset(r0) uses a value in memory as the address
>but mov (pc)+,#1234 which loads an immediate does so too)?  Why the subtle
>difference?  Any thoughts on the subject?

Try putting your code in ROM and see what happens. Just one example.
Besides I think jump tables are generally not altered, pointers are.
The jump tables could well be in the text segment, for instance.
A jump table is not an array of function pointers.

>	Hubert Matthews
>
>(I don't consider LISP or PROLOG programs that create code on the fly to be
>SMC.  Does anyone disagree?)

No, unless the code is buggy; such code on a fly could well be
Self Multiplying 8-).

  Leo.

smryan@garth.UUCP (Steven Ryan) (07/17/88)

>As to the semantics of self-modifying code, isn't it identical to
>a Turing machine's or am I missing something?

No. The state mechanism of a Turing machine is fixed and finite.

To show equivalence, it is necessary to show that self modifying code
does not increase the number of states. Usually, people just appeal
to Church-Turing and assume all the states are there, somewhere.

For something like a load and go compiler, since every possible program
is not hidden somewhere in the compiler, it would appear to escape CT.
However the compiled program can be regarded as a transformation of the
input tape into another region of the tape and its apparent execution
as an interpretation of the tape by the compiler/loader/cpu state machine.

Then again, if you believe really hard and clap your hands.....

                            sm ryan

----------------------------------
Hugh: Why does he carry a sword?
Fred: He has delusions of grandeur.
him:  Uh, Fred...
      (SWOOP)
Fred: YEEK
him:  ...before you make a value judgement about a person's delusions...
      (poink)
Fred: AWK! My head feathers!
him:  ...you'd better be absolutely sure they ARE delusions.
      (JAB)
Fred: OUCH!
                    -- Odd Bodkins

elg@killer.DALLAS.TX.US (Eric Green) (07/17/88)

in article <4776@killer.UUCP>, chasm@killer.UUCP (Charles Marslett) says:
> In article <33441@yale-celray.yale.UUCP>, lisper-bjorn@CS.YALE.EDU (Bjorn Lisper) writes:
>> Another related issue is that self-modifying code cannot be executed
>> directly from ROM. Executing programs in ROM is an important memory-saving
>> technique in small, diskless, special-purpose systems.
> 
> Actually, I always thought that ROM was much more important as a way of
> eliminating the need to load the code and as a way of guaranteeing non-
> self-modifying-ness.  Usually ROM code is larger (thus memory-consuming)
> than RAM-loaded code. 

Uh, I don't know what systems you're talking about, but for the small
microcontrollers that I've used -- the ROM is assembled to be placed
in the memory map at an absolute location. That is, it takes up the
same amount of space whether it's in ROM or RAM (in fact, one of my
favorite hardware designs is a little SRAM card that I use to spoof
ROM for prototyping purposes).  As for re-using memory buffers etc.,
those are always in RAM, so those are always re-usable -- no matter
where your executable is loaded. 

32K bytes of ROM costs about 1/4th of what 32K bytes of static RAM
costs (not to mention the eliminating-need-to-load). Note that these
are dedicated control systems using 6502's, Z-80's, and 6809's, which
have a total address space of 64K -- so 32K of ROM is a heckuva lot.

> Seriously, newer architectures have reduced the difference (to 0 perhaps),
> but the emphasis on RISC these days may resurrect self-modifying code --
> a RISC-er technique is not known to mankind!  (:-D}===<<).

Actually, RISC may be the stake-in-the-heart of self-modifying code.
RISC technology overcomes the increased memory bandwidth requirement
by using enhanced cache technology, 256-byte-at-a-time prefetch from
page-mode DRAM's, heavy pipelining, and every other trick in the book.
Most of which are quite contrary to the thought of self-modifying
code. Another RISC trick is to use the saved silicon for extra
registers, and have the compilers keep as many valus as possible in
the CPU registers. You can always do a one-word memory-indirect load
off of a register much faster than you can do a "memory-write(the
address into the instruction) memory-read(the address part of the
modified instruction) memory-read" triplet (which is at least three
two-word instructions, as vs. a single one-word instruction, on a
machine where one cycle = one word processed).

But, I HAVE done self-modifying code on a 6502-based microcontroller
(out of sheer necessity). I still make an occasional enhancement to
that software, over 3 years later, as new applications arise, and have
had no problems with readability. After all, in assembly on a
microcontroller, you have to save the address somewhere, and the
fact that the "somewhere" happens to be inside an assembly language
instruction makes little difference. The argument against
self-modifying code lies elsewhere besides readability.

--
Eric Lee Green    ..!{ames,decwrl,mit-eddie,osu-cis}!killer!elg
          Snail Mail P.O. Box 92191 Lafayette, LA 70509              
       MISFORTUNE, n. The kind of fortune that never misses.

g-rh@cca.CCA.COM (Richard Harter) (07/17/88)

In article <989@garth.UUCP> smryan@garth.UUCP (Steven Ryan) writes:
>>As to the semantics of self-modifying code, isn't it identical to
>>a Turing machine's or am I missing something?

>No. The state mechanism of a Turing machine is fixed and finite.

>To show equivalence, it is necessary to show that self modifying code
>does not increase the number of states. Usually, people just appeal
>to Church-Turing and assume all the states are there, somewhere.

	This needs some qualification.  Strictly speaking no computers
are Turing machines because they do not have an infinite memory or 
equivalent.  However almost all computers would be universal Turing
machines if they somehow did.

	However it depends on what level you are speaking of.  If you
are talking about the hardware, that's a *qualified* Turing machine.
You can talk about a language being a virtual Turing machine; that makes
sense, albeit you want to be more careful with your language than I am
being here.  The language remains fixed, and most languages are universal
Turing machines.  In this case the SMC is data.  If you want to talk
about a specific program as a virtual Turing machine (probably not
universal) then, yes, it can't modify itself.
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

conybear@moncsbruce.oz (Roland Conybeare) (07/18/88)

From article <752@cernvax.UUCP>, by hjm@cernvax.UUCP (hjm):
> As a final note, why is it 'clean' to alter a jump table and 'unclean' to 
> alter an inline constant (e.g. jmp @offset(r0) uses a value in memory as the
> address but mov (pc)+,#1234 which loads an immediate does so too)?  Why
> the subtle difference?  Any thoughts on the subject?
> 
> 	Hubert Matthews

I can see several reasons.

* the big, big reason for referring to code via pointers, and getting the effect
  of self-modifying code via such pointers, is that you make your changes
  independent of the size of the code.  Real SMC will only work when the new
  code is no larger than the old code.  I think this is a very restrictive
  assumption.

* when you alter a jump table (in C, at least) you are doing so within the 
  language, and can expect the compiler to understand you.  A language which
  allows you to modify instructions directly would of necessity depend strongly
  on the machine architecture to run these instructions.  Otherwise, why don't
  we all use Universal Assembly Language?

Roland Conybeare
conybear@moncsbruce.oz

  an instruction, like mov (pc)+,#1234 you are assuming that the change you
  make 

daveb@geac.UUCP (David Collier-Brown) (07/18/88)

From article <33441@yale-celray.yale.UUCP>, by lisper-bjorn@CS.YALE.EDU (Bjorn Lisper):
> In article <12357@ut-sally.UUCP> nather@ut-sally.UUCP (Ed Nather) writes:
>>In article <5254@june.cs.washington.edu>, pardo@june.cs.washington.edu
> (David Keppel) writes:

[a long, and generally well-reasoned debate elided]

>>Sorry, no.  I've heard LOTS of arguments against programs that generate their
>>own code, 

  Gentles, could we ***PLEASE*** reserve the term "self-modifying code" for
code which actually modifies itself on the fly (eg, for generating
indexes by instruction modification instead of using index registers)?

  Much of what is being discussed here is part of the well-known and
respectable "generate and execute" paradigm. The only difference
between this and the normal code-generation paradigm is that an
application generates code at run-time, not a compiler at a previous
time.

  Mixing the two is making this discussion "noisy".

--dave (sorry about the pedantry, but its important) c-b
-- 
 David Collier-Brown.  {mnetor yunexus utgpu}!geac!daveb
 Geac Computers Ltd.,  |  Computer science loses its
 350 Steelcase Road,   |  memory, if not its mind,
 Markham, Ontario.     |  every six months.

baum@Apple.COM (Allen J. Baum) (07/19/88)

[]

For those of you interested in on-the-fly generated code, here is a reference
to an interesting article:
Turning interpreters into Compilers
Frank Pagan -C.S.U. Fullerton
June 88 Software- Practice and Experience

--
{decwrl,hplabs,ihnp4}!nsc!apple!baum		(408)973-3385

hwfe@ur-tut (Harlan Feinstein) (07/19/88)

Reasons for writing self-modifying code have been scarce, and have basically
centered around speed reasons.  Speed is not the only reasons why someone
would want to write s-m code.  I've seen a program for the IBM PC called 
PC-LOCK that prohibits one from accessing the hard disk without providing
a password.  The install and de-install programs are self-modifying to 
make tracing them all but impossible.  

One time last summer I changed my password right before the weekend, and
wouldn't you know it, come Monday I had no clue as to what the password was.
I tried all kinds of disassembly and debug tracing to no avail.  Hence, another
reasons for self-modifying code.

--harlan

hjm@cernvax.UUCP (hjm) (07/19/88)

Programmers have become quite accustomed to the destructive assignment of
data ( x := y + 2 ), so why should the destructive changing of code be any
different?  If the practice were more common, then it wouldn't seem quite so
bad, surely.  After all, a Turing machine with a finite length tape is not
*that* difficult ...

	Hubert Matthews

colwell@mfci.UUCP (Robert Colwell) (07/21/88)

In article <759@cernvax.UUCP> hjm@cernvax.UUCP () writes:
>
>Programmers have become quite accustomed to the destructive assignment of
>data ( x := y + 2 ), so why should the destructive changing of code be any
>different?  If the practice were more common, then it wouldn't seem quite so
>bad, surely.  After all, a Turing machine with a finite length tape is not
>*that* difficult ...

Hubert, I was going to let this whole SMC discussion slide on by, but
your post finally convinced me otherwise.

First, you meant "y" := y + 2, not "x", right?

Second, it seems like only yesterday when we (the royal we) CPU
architects were so concerned with trying to narrow the semantic gap
between what a programmer was trying to express and what the
machine/language was coercing her into.  Languages like Ada and
machine architectures like capability machines were intended to
address this perceived need.  The basic idea is that (oversimplifying
drastically) in a small programming environment (you at your
standalone workstation) anything goes in terms of protection and
language type checking, etc.  

The interesting domain is that of programming-in-the-large, with
large teams of programmers producing hundreds of thousands of lines
of code (SDI/Shuttle).  There, extracting the last nano-mip of
performance is of far less interest than in avoiding bugs and
producing maintainable code, and I still believe that supporting that
environment is a far more important challenge to architects than
worrying about SMC could ever be.  And I don't think (heavy sarcasm
here) that UNIX/C is the ultimate answer to that environment.




Bob Colwell            mfci!colwell@uunet.uucp
Multiflow Computer
175 N. Main St.
Branford, CT 06405     203-488-6090

guy@gorodish.Sun.COM (Guy Harris) (07/21/88)

> Second, it seems like only yesterday when we (the royal we) CPU
> architects were so concerned with trying to narrow the semantic gap
> between what a programmer was trying to express and what the
> machine/language was coercing her into.  Languages like Ada and
> machine architectures like capability machines were intended to
> address this perceived need.

A naive (and not rhetorical) question: what evidence is there to indicate the
degree to which "narrowing the semantic gap" with capability machines and the
like would improve the productivity of programmers or the reliability of
programs, and to which other techniques (e.g., making a very fast conventional
machine, possibly but not necessarily using RISC technology, and using that
speed to improve the programming environment with better software) achieve the
same goal?

peter@athena.mit.edu (Peter J Desnoyers) (07/21/88)

In article <60782@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes:
>
>A naive (and not rhetorical) question: what evidence is there to indicate the
>degree to which "narrowing the semantic gap" with capability machines and the
>like would improve the productivity of programmers or the reliability of
>programs, and to which other techniques [fast machines, good software]
>achieve the same goal?

Some protection and tracing features are MUCH slower in software.
These features are also useful in writing wonderful debuggers. You
need a bus snooper or ICE with an 8088 to achieve debugger features
that can be done in software on the 68020 or 80386. And per-task
memory protection is a big win in any environment, but impossible in
software. 

				Peter Desnoyers
				peter@athena.mit.edu

guy@gorodish.Sun.COM (Guy Harris) (07/22/88)

> >A naive (and not rhetorical) question: what evidence is there to indicate
> >the degree to which "narrowing the semantic gap" with capability machines
> >and the like would improve the productivity of programmers or the
> >reliability of programs, and to which other techniques [fast machines, good
> >software] achieve the same goal?
> 
> Some protection and tracing features are MUCH slower in software.
> These features are also useful in writing wonderful debuggers. You
> need a bus snooper or ICE with an 8088 to achieve debugger features
> that can be done in software on the 68020 or 80386. And per-task
> memory protection is a big win in any environment, but impossible in
> software. 

OK, maybe I didn't make myself clear.  I'm not referring to features you find
on many random processors or systems out there, such as memory management units
or "trap on transfer of control/trap on reference to particular location"
breakpoints.  I'm referring to the sort of architectural features you find in
machines that, say, require references to objects to go through a "descriptor"
for the object, to do bounds checking and the like.

One instantiation of the question would be "are you better off doing that or
having a fast machine plus a compiler that can e.g.  generate bounds checking
code but avoid doing it in some cases where it's 'known' that you don't need
it?"  For instance, don't bother doing it in

	double	a[SIZE], b[SIZE], c[SIZE];

	for (i = 0; i < SIZE; i++)
		a[i] = b[i] + c[i];

(or the FORTRAN, or Ada, or COBOL, or... equivalent), since you know "i" will
always be within the bounds of all the arrays.

Not "do you think you'd be better off" or "do you have an analysis that makes
it 'intuitively obvious' that you'd be better off", but "has anybody compared
actual implementations of the two approaches, and concluded that, with
everything taken into account, you're better off with one or the other?"  I.e.,
taking into account the fact that microcode and hardware is like software in
that it is possible for it to be buggy (either designed incorrectly or
implemented incorrectly), and taking into account the time it takes to design
that system, and....

colwell@mfci.UUCP (Robert Colwell) (07/22/88)

In article <60782@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes:
>> Second, it seems like only yesterday when we (the royal we) CPU
>> architects were so concerned with trying to narrow the semantic gap
>> between what a programmer was trying to express and what the
>> machine/language was coercing her into.  Languages like Ada and
>> machine architectures like capability machines were intended to
>> address this perceived need.
>
>A naive (and not rhetorical) question: what evidence is there to indicate the
>degree to which "narrowing the semantic gap" with capability machines and the
>like would improve the productivity of programmers or the reliability of
>programs, and to which other techniques (e.g., making a very fast conventional
>machine, possibly but not necessarily using RISC technology, and using that
>speed to improve the programming environment with better software) achieve the
>same goal?

As far as I know, there is no evidence that you would necessarily
find compelling, but then I could say that same thing about almost
everything else in this field, too.  There are, on the other hand,
some good reasons to believe that we can do better than imperative
languages running on essentially unprotected architectures.

I know I can't do this topic justice in this forum, but here's my
quick take on the subject.  Think about the different computer
languages you have used, and what was good or bad about each.  Backus
argued (in his famous paper on functional languages) that one of the
reasons that Lisp is a good language is that you don't have to
mentally execute a program (as you do with imperative languages) in
order to convince yourself that it will do what is wanted.  The
language allows you to more closely express what is desired, so you
test correctness by inspection.  And yes, there are cases where it's
more obvious/easier-to-express something in C than Lisp.  But both
examples serve to illustrate the point -- you, as a programmer, have
some virtual entity you want to realize (data structure or
algorithm), and the closer you can get to realizing exactly what you
have in mind, the more likely the code is to be correct,
maintainable, and understandable (and the smaller the semantic gap
between what you want and what you can express).

That's partly an allegory.  The capability people argue that the same
thing extends into all areas of computer systems.  Recall the classic
arguments about turning off runtime bounds checking to reclaim that
lost performance -- why should a programmer, in the best of all
possible worlds, have to worry about things like that?  It doesn't
help any of the major productivity metrics.

Say what you want about Ada on the 432 (and I've said plenty
already), as a programming environment (forget the speed problems
temporarily) it was really nice.  Your program had access to only the
data structures you specified, and only in the ways you specified,
and if your code screwed up, the machine could tell you where and in
what way.  To me, the hardest bugs to find are those where you fix
one bug (in someone else's code, of course) and inadvertently break
it in some obscure way such that something completely unrelated is
getting trashed.  For this to even be possible (in my opinion) means
that the programming environment is lacking something crucial,
notwithstanding that about 95% of all programmers on this planet see
just such an environment.  The environment is failing to express what
the programmer wanted, and it's a combined failure of the machine
architecture, the language, and probably the OS too.  The semantic
gap argument says that the more the desired and achieved environments
differ, the larger the quantity of bugs, and the worse their
obscurity will be.

I know you're tempted at this point to say that even if one grants
this as a shortcoming of current architectures/environments, there
have been no successes so far at addressing it.  That's another topic
of conversation, I think; all I'm trying to do here is offer some of
the justification for why a lot of research has gone into other ways
to compute (that don't have sheer instrs-executed-per-unit-time as
their primary figure of merit).

All the strong-type-checking stuff built into some languages has the
same motivation as above.

For me, the bottom line is that, as usual, there aren't any easy
answers (because if there were somebody would've patented them by
now), but we shouldn't lose track of the corresponding questions just
on that basis.  The problem is getting worse, after all -- more, not
fewer, lives currently depend on the correctness of software than
ever before, and that trend will continue (fly-by-computer airplanes,
dynamically-unstable planes, military systems, ...).

I assume that the reason people like you and me are concentrating on
performance is that that's what sells.  I don't think that needs
defending, but I also see few reasons to believe that sheer
performance alone will provide the cures to the problems I tried to
outline above.

Bob Colwell            mfci!colwell@uunet.uucp
Multiflow Computer
175 N. Main St.
Branford, CT 06405     203-488-6090

henry@utzoo.uucp (Henry Spencer) (07/24/88)

In article <473@m3.mfci.UUCP> colwell@mfci.UUCP (Robert Colwell) writes:
>... The capability people argue that the same
>thing extends into all areas of computer systems.  Recall the classic
>arguments about turning off runtime bounds checking to reclaim that
>lost performance -- why should a programmer, in the best of all
>possible worlds, have to worry about things like that?  ...

My counterargument is that it is almost as much of an imposition on the
programmer to have such checks done at runtime as it is to have them not
done at all.  Given the impossibility of complete testing, there is no
way to guarantee such tests will catch a bug during development.  This
means that the programmer has to go through the same complicated exercise
of trying to assure himself that the problem will never happen.  What is
really wanted is a way to check these properties at COMPILE TIME... in
which case the runtime checks become largely superfluous anyway.

Can it be done?  Well, in one sense the answer is clearly yes, because a
proof of program correctness has to include it, and we know that automating
such proofs is possible (although seldom practical at the moment).  The
question is whether it can be done with practical, affordable machinery
without crippling the language.  My own long-held conjecture is that the
answer is "yes", but I don't have proof of that yet.
-- 
Anyone who buys Wisconsin cheese is|  Henry Spencer at U of Toronto Zoology
a traitor to mankind.  --Pournelle |uunet!mnetor!utzoo! henry @zoo.toronto.edu

colwell@mfci.UUCP (Robert Colwell) (07/24/88)

In article <1988Jul23.221743.22169@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>In article <473@m3.mfci.UUCP> colwell@mfci.UUCP (Robert Colwell) writes:
>>... The capability people argue that the same
>>thing extends into all areas of computer systems.  Recall the classic
>>arguments about turning off runtime bounds checking to reclaim that
>>lost performance -- why should a programmer, in the best of all
>>possible worlds, have to worry about things like that?  ...
>
>My counterargument is that it is almost as much of an imposition on the
>programmer to have such checks done at runtime as it is to have them not
>done at all.  Given the impossibility of complete testing, there is no
>way to guarantee such tests will catch a bug during development.  This
>means that the programmer has to go through the same complicated exercise
>of trying to assure himself that the problem will never happen.  What is
>really wanted is a way to check these properties at COMPILE TIME... in
>which case the runtime checks become largely superfluous anyway.

We probably agree that the compiler should check as much as it
possibly can; the earlier an error is detected, the less the
ambiguity about what is wrong and the cheaper the cure.  But there is
an awful lot the compiler can't know about -- programs that use input
data, generate pointers, or do just about anything else that's
interesting, are going to do many operations that are difficult for
the compiler to validate at compile time.

I'm not sure I catch your drift on the imposition of runtime checks.
To me, the best human debuggers have the ability to figure out the
fastest which assumptions are being made that are wrong, thus getting
to the heart of some problem the quickest.  If I have a program bug,
I want the machine to express as directly as possible the intent of
my program so that I can narrow the bug-space down to my code alone.
If the machine allows a change to one variable (an array, say) to
affect some other unrelated variable, then it is not conforming to
the intent of my program.  In fact, it is not conforming to anything
useful at all, since nobody would argue that it is useful programming
practice to ever do such a thing on purpose (I hope?).  Given that,
the fact that the machine can do such a thing, let alone do it as a
default, is what I'd label a shortcoming in the basic architecture.
One we've all long ago learned to live with, to be sure, but it's not
something to be proud of, at very least.

>Can it be done?  Well, in one sense the answer is clearly yes, because a
>proof of program correctness has to include it, and we know that automating
>such proofs is possible (although seldom practical at the moment).  The
>question is whether it can be done with practical, affordable machinery
>without crippling the language.  My own long-held conjecture is that the
>answer is "yes", but I don't have proof of that yet.

I sure hope you're right.  In fact, if progress towards this goal
becomes evident, I'd propose we start discussing ways of turning
architecture towards better ways of supporting this instead of
throughput or programming environments.


Bob Colwell            mfci!colwell@uunet.uucp
Multiflow Computer
175 N. Main St.
Branford, CT 06405     203-488-6090

ge@hobbit.sci.kun.nl (Ge' Weijers) (07/25/88)

From article <1087@ficc.UUCP>, by peter@ficc.UUCP (Peter da Silva):
> I have a question:
> 
> 	Why are an Icache plus a Dcache better than just
> 	a big shared cache as big as both?

The answer is bandwidth. The CPU does not have to stop filling the instruction
pipeline when it accesses/writes data.
-- 
Ge' Weijers, Informatics dept., Nijmegen University, the Netherlands
UUCP: {uunet!,}mcvax!kunivv1!hobbit!ge

guy@gorodish.Sun.COM (Guy Harris) (07/26/88)

> We probably agree that the compiler should check as much as it
> possibly can; the earlier an error is detected, the less the
> ambiguity about what is wrong and the cheaper the cure.  But there is
> an awful lot the compiler can't know about -- programs that use input
> data, generate pointers, or do just about anything else that's
> interesting, are going to do many operations that are difficult for
> the compiler to validate at compile time.

In the case of programs that use input data, said programs should validate the
input data before using it (unless they have good reason to know the data will
*never* be incorrect); I would prefer to see

	input, line 12: value "12038" is out of range (should be between
	  17 and 137)

than to see

	ILLEGAL ARRAY REFERENCE: "frobozz.q", line 31
		Subscript was 12038; should be between 17 and 137

as the former, at least, tells me that the ultimate problem is in the data, not
the code.  Could a compiler know to remove run-time subscript checks in:

	i = read();
	if (i < 17 || i > 137)
		fail("input, line %d: value "%d" is out of range (should be
		    between 17 and 137");
	printf("input value for %d is %d\n", i, array[i]);

If so, it should probably issue warnings when it *isn't* able to remove
run-time subscript checks, to inform programmers that they should probably
put those checks in themselves.

> >Can it be done?  Well, in one sense the answer is clearly yes, because a
> >proof of program correctness has to include it, and we know that automating
> >such proofs is possible (although seldom practical at the moment).  The
> >question is whether it can be done with practical, affordable machinery
> >without crippling the language.  My own long-held conjecture is that the
> >answer is "yes", but I don't have proof of that yet.
> 
> I sure hope you're right.  In fact, if progress towards this goal
> becomes evident, I'd propose we start discussing ways of turning
> architecture towards better ways of supporting this instead of
> throughput or programming environments.

Yes, but does this need "architectural" support, at least in the sense of
"instruction set architecture"?  If a compiler for a "conventional" machine can
do that level of validation, subscript-range checking features in the
instruction set would be seldom, if ever, used.

If "architecture" includes the compiler, then I agree that "architectural"
support for this would be nice.

smryan@garth.UUCP (Steven Ryan) (07/26/88)

>If the machine allows a change to one variable (an array, say) to
>affect some other unrelated variable, then it is not conforming to
>the intent of my program.  In fact, it is not conforming to anything
>useful at all, since nobody would argue that it is useful programming
>practice to ever do such a thing on purpose (I hope?).

This particular example is done all the time in handling recursive data
structures.

>>Can it be done?  Well, in one sense the answer is clearly yes, because a
>>proof of program correctness has to include it, and we know that automating
>>such proofs is possible (although seldom practical at the moment).

Depends on whether all possible programs (including those of monkey
programmers) or just `practical' programs are considerred. Formal proofs
of all possible programs are impossible, flat out, no appeal. Formal proofs
of practical programs depend on how powerful practical has to be to remain
useful.

levy@ttrdc.UUCP (Daniel R. Levy) (07/26/88)

In article <1988Jul23.221743.22169@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) writes:
> -- 
> Anyone who buys Wisconsin cheese is|  Henry Spencer at U of Toronto Zoology
> a traitor to mankind.  --Pournelle |uunet!mnetor!utzoo! henry @zoo.toronto.edu

First it was NASA, now it's Wisconsin cheese.  What gripe do you (and Pournelle
for that matter) have against Wisconsin cheese?  (The imagination boggles.)
-- 
|------------Dan Levy------------|  THE OPINIONS EXPRESSED HEREIN ARE MINE ONLY
|    AT&T  Data Systems Group    |  AND ARE NOT TO BE IMPUTED TO AT&T.
|        Skokie, Illinois        | 
|-----Path:  att!ttbcad!levy-----|

colwell@mfci.UUCP (Robert Colwell) (07/26/88)

In article <61251@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes:
>In the case of programs that use input data, said programs should validate the
>input data before using it (unless they have good reason to know the data will
>*never* be incorrect); I would prefer to see
>
>	input, line 12: value "12038" is out of range (should be between
>	  17 and 137)
>
>than to see
>
>	ILLEGAL ARRAY REFERENCE: "frobozz.q", line 31
>		Subscript was 12038; should be between 17 and 137
>
>Yes, but does this need "architectural" support, at least in the sense of
>"instruction set architecture"?  If a compiler for a "conventional" machine can
>do that level of validation, subscript-range checking features in the
>instruction set would be seldom, if ever, used.
>
>If "architecture" includes the compiler, then I agree that "architectural"
>support for this would be nice.

But the whole point of capability machines (to name a single example)
is that one cannot cover the space of all interesting
exception-possibilities using only a compiler, no matter how smart.
For one thing, the program could be coercing data types back and
forth such that checking an operation on a type can only be done at
the time the operation is applied.

But a more fundamental matter is how one manages the development of a
large software project with dozens of programmers contributing to a
single large end product.  The modules are all separately compiled,
so there is no question of the compiler helping out much.

Given access to all the symbol tables, you could imagine the linker
doing some reasonable checks of consistency (number and types of
args, for instance), but even that fails when pointers are being
passed (pointers to functions, even).

You can catch a lot of the common cases with good programming style,
as you note above.  But you can't catch them all, and the question
that capability machines pose is "how close can we come to an
airtight programming environment, and how much would it cost"?
(simplistic paraphrase, I know; maybe it'll draw some capability
people out of their woodwork abstraction!)

And how about functional languages?  Again, the compiler can only do
so much, and the space it covers is not intuitively obvious to the
programmer.  If it doesn't entirely cover the bug-space, and you
aren't sure what it *does* cover, then the coverage is much less
useful (and may not be useful at all, as Henry Spencer was alluding
to in an earlier post.)

Bob Colwell            mfci!colwell@uunet.uucp
Multiflow Computer
175 N. Main St.
Branford, CT 06405     203-488-6090

guy@gorodish.Sun.COM (Guy Harris) (07/27/88)

> >Yes, but does this need "architectural" support, at least in the sense of
> >"instruction set architecture"?  If a compiler for a "conventional" machine
> >can do that level of validation, subscript-range checking features in the
> >instruction set would be seldom, if ever, used.
> >
> >If "architecture" includes the compiler, then I agree that "architectural"
> >support for this would be nice.
> 
> But the whole point of capability machines (to name a single example)
> is that one cannot cover the space of all interesting
> exception-possibilities using only a compiler, no matter how smart.
> For one thing, the program could be coercing data types back and
> forth such that checking an operation on a type can only be done at
> the time the operation is applied.

You missed the point.  As I said in an earlier posting, you can spend some
increased speed by putting in code to do the appropriate checks on a
"conventional" machine.  If the compiler can eliminate most of those checks,
so much the better; you spend less of the increased speed.

I don't see that, for example, doing these checks in microcode, as opposed to
regular code, is *ipso facto* the only way to do things.  In some ways, that's
what I've been asking: what reason is there to believe that capability
machines and the like are the *only* way, or even just the *best* way, to
achieve the kind of programming environment I suspect most, if not all, of us
would like to see?

> But a more fundamental matter is how one manages the development of a
> large software project with dozens of programmers contributing to a
> single large end product.  The modules are all separately compiled,
> so there is no question of the compiler helping out much.
> 
> Given access to all the symbol tables, you could imagine the linker
> doing some reasonable checks of consistency (number and types of
> args, for instance), but even that fails when pointers are being
> passed (pointers to functions, even).

I'm not so sure of that.  I don't see how (assuming a "reasonable" language and
"reasonable" linkers) doing the check at link time is intrinsically any
different from doing it at compile time, if (as per the assumptions listed) the
linker has all the information about types that was available to the compiler.
I also don't see that pointers are that bad a problem, assuming pointers are
typed (again, assuming a "reasonable" language); in ANSI C, for instance,
"pointer to 'int'-valued function taking an 'int' argument" is a different type
from "pointer to 'int'-valued function taking a 'float' argument".

> You can catch a lot of the common cases with good programming style,
> as you note above.  But you can't catch them all, and the question
> that capability machines pose is "how close can we come to an
> airtight programming environment, and how much would it cost"?
> (simplistic paraphrase, I know; maybe it'll draw some capability
> people out of their woodwork abstraction!)

I agree that you may not be able to catch all examples of incorrect code.
However, if you can avoid doing any checking in those cases where you *can*
catch them at compile time, you may be better off.  The further questions that
capability machines pose are:

	1) How much "safer" are they than "conventional" machines plus
	   compilers that generate run-time checks?

and

	2) If this extra safety is expensive, is it worth the cost?

(Remember, the sort of embedded systems to which you've referred in the past do
have real-time constraints, so there is presumably some minimum performance
required; as the jobs they do get more sophisticated, the minimum performance
required increases.)

smryan@garth.UUCP (Steven Ryan) (07/27/88)

In article <1075@garth.UUCP> smryan@garth.UUCP (Steven Ryan) writes:
>>If the machine allows a change to one variable (an array, say) to
>>affect some other unrelated variable, then it is not conforming to
>>the intent of my program.  In fact, it is not conforming to anything
>>useful at all, since nobody would argue that it is useful programming
>>practice to ever do such a thing on purpose (I hope?).
>
>This particular example is done all the time in handling recursive data
>structures.

What do I mean by that? [I do not know how to reply by private mail--student
driver.]

A simple example is reversing a singly linked list and, just to make it
interesting, inserting the list ordinal:

p := nil; q := first_element; n := 1;
while q isnt nil
 do
  r := q.link;     -- r and q.link are variables pointing at the location.
  q.link := p;     -- p and q.link are now pointing at the same location.
  p := q;          -- now p and q.
  p.ordinal := n;  -- as a sideeffect, also modifies q.ordinal.
  q := r;          -- now r and q.
  n +:= 1
 od

ge@hobbit.sci.kun.nl (Ge' Weijers) (07/27/88)

From article <1848@van-bc.UUCP>, by sl@van-bc.UUCP (pri=-10 Stuart Lynne):
[About self-modifying code]
> A simple definition might be any program that cannot be compiled and run
> in shared text mode (or equivalent for non Unix application environments).

What about programs that write code into the data segment and then 
call it:

int myproc[1000];

..... generating s/m code

(*((int (*)())myproc))(); /* and calling it */

We need a better definition I'm afraid
-- 
Ge' Weijers, Informatics dept., Nijmegen University, the Netherlands
UUCP: {uunet!,}mcvax!kunivv1!hobbit!ge

baum@Apple.COM (Allen J. Baum) (07/28/88)

[]
>In article <61459@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes:
>I don't see that, for example, doing these checks in microcode, as opposed to
>regular code, is *ipso facto* the only way to do things....
 Presumably, hardware (done in parallel with fetches) is the way to go
>... the questions that capability machines pose are:
>
>	1) How much "safer" are they than "conventional" machines plus
>	   compilers that generate run-time checks?
>
>and
>
>	2) If this extra safety is expensive, is it worth the cost?

In some sense, the cost is very little, probably not much more than your 
standard segmentation scheme. The hardware cost is usually a tag bit, which
is fast, pretty cheap on chip, but possibly expensive in your memory system.

--
{decwrl,hplabs,ihnp4}!nsc!apple!baum		(408)973-3385

colwell@mfci.UUCP (Robert Colwell) (07/28/88)

In article <61459@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes:
>> >Yes, but does this need "architectural" support, at least in the sense of
>> >"instruction set architecture"?  If a compiler for a "conventional" machine
>> >can do that level of validation, subscript-range checking features in the
>> >instruction set would be seldom, if ever, used.
>> >
>> >If "architecture" includes the compiler, then I agree that "architectural"
>> >support for this would be nice.
>> 
>> But the whole point of capability machines (to name a single example)
>> is that one cannot cover the space of all interesting
>> exception-possibilities using only a compiler, no matter how smart.
>> For one thing, the program could be coercing data types back and
>> forth such that checking an operation on a type can only be done at
>> the time the operation is applied.
>
>You missed the point.  As I said in an earlier posting, you can spend some
>increased speed by putting in code to do the appropriate checks on a
>"conventional" machine.  If the compiler can eliminate most of those checks,
>so much the better; you spend less of the increased speed.

Actually, I didn't miss the point, I ignored it because I wanted to
deal with something I consider more important.  Yes you can put in
more checks.  And if you aren't going to let me get anything better
than that for a programming environment, then I'll take it as an
improvement over the C/Unixes that currently exist.

But I have a lot of doubts about this approach.  To me it's like
observing that an earth dam has 300 holes, and proposing to fix them
one-by-one, each hole needing some different fix from the one before.
The whole point of capabilities (at least as the 432 implemented
them) was to provide a seamless programming environment, with exactly
the same abstraction (that of an "object") presented to the
programmer, no matter in which direction she looked (left or right
through the code, or down to the hardware).  You can't get this by
throwing a large number of bandaids at the problem, even if you say
they don't cost much in performance.

>I don't see that, for example, doing these checks in microcode, as opposed to
>regular code, is *ipso facto* the only way to do things.  In some ways, that's
>what I've been asking: what reason is there to believe that capability
>machines and the like are the *only* way, or even just the *best* way, to
>achieve the kind of programming environment I suspect most, if not all, of us
>would like to see?

Actually, I didn't say it was the only way.  In fact, all I've been
trying to argue is that there are reasons to believe that there are
things worth considering from that kind of research and that type of
programming environment.  Sometimes I feel like all we do is debate
the race cars, when it's the production cars that represent all the
money.

[discussion about specific checks elided; this message is too long]

>I agree that you may not be able to catch all examples of incorrect code.
>However, if you can avoid doing any checking in those cases where you *can*
>catch them at compile time, you may be better off.  The further questions that
>capability machines pose are:

You won't catch me disagreeing with this, even for capability
machines.  In fact, that was a major point of our article in the 13th
Computer Arch Symp (I think it was 13 -- the one in Tokyo).

>	1) How much "safer" are they than "conventional" machines plus
>	   compilers that generate run-time checks?
>
>and
>
>	2) If this extra safety is expensive, is it worth the cost?

Absolutely.  I tried to quantify the cost of 432-style object
orientation in my thesis.  In a nutshell, I concluded that that style
of machine be from 1 to 4 times slower than a conventional
unprotected architecture made of equivalent implementation
technology.  (1 times slower means same speed).  That's an order of
magnitude faster than the 432 really was, but that doesn't count (see
next issue of ACM TOCS to see why).

You could do their transparent multiprocessing and buy that factor of
4 back, but if you got above 6 or 7 you'd saturate the bus
(memory-to-memory architecture).  Things get hard to sort out after
that.

Is that performance hit worth it?  Who knows?  I'd say it's probably
worth it a lot more often than most people currently think.  An awful
lot of code got written and debugged on machines that were a lot slower
than the could-have-been 432.  You have to allow for a natural
tendency (which I think largely fuels this whole industry) to always
want more and to refuse to ever go backwards in anything, speed,
memory, disk, sw complexity...

>(Remember, the sort of embedded systems to which you've referred in the past do
>have real-time constraints, so there is presumably some minimum performance
>required; as the jobs they do get more sophisticated, the minimum performance
>required increases.)

Yeah, well, the Space Shuttle doesn't exactly have blazing hardware.
But it's running some of the most sophisticated software I
know of.  And telephone switchers aren't mini-crays by any means.

Bob Colwell            mfci!colwell@uunet.uucp
Multiflow Computer
175 N. Main St.
Branford, CT 06405     203-488-6090

guy@gorodish.Sun.COM (Guy Harris) (07/28/88)

> But I have a lot of doubts about this approach.  To me it's like
> observing that an earth dam has 300 holes, and proposing to fix them
> one-by-one, each hole needing some different fix from the one before.
> The whole point of capabilities (at least as the 432 implemented
> them) was to provide a seamless programming environment, with exactly
> the same abstraction (that of an "object") presented to the
> programmer, no matter in which direction she looked (left or right
> through the code, or down to the hardware).

What if they look at the microcode, or directly at the hardware (if the
underlying machine isn't implemented using microcode)?  In order for the
environment to be truly airtight, the microcode (or hardware) has to be
airtight.

It may be that producing a sufficiently airtight compiler that implements the
object-oriented model by producing the "right" code, including checks, is
either impossible or more costly than producing a microcode or hardware
implementation atop a compiler that doesn't have to be so airtight because the
microcode or hardware is.  If people's experience points in this direction,
fine.

However, how is this different from, say, implementing the interpreter in some
"conventional" machine language, or in some language compiled into
"conventional" machine language?  If it's not different, could the interpreter
then try compiling the code to be interpreted into that machine language,
preserving the run-time checks?  Could this still be sufficiently airtight?  If
so, would it be faster than, say, a microcode implementation of the same
interpreter with no compilation?  How about a microcode implementation that
compiles either into some faster-to-interpret code, or directly into microcode?

> Actually, I didn't say it was the only way.  In fact, all I've been
> trying to argue is that there are reasons to believe that there are
> things worth considering from that kind of research and that type of
> programming environment.

I'll certainly agree with that; however...

> Sometimes I feel like all we do is debate the race cars, when it's the
> production cars that represent all the money.

I won't agree with that, because

	1) I don't think that's *all* people have been doing.  While there's
	   been relatively little discussion of it in *this* forum, I get the
	   impression - perhaps erroneously - that people are thinking about
	   how to implement the desired kind of programming environment, even
	   if more in the "atop a 'conventional' machine language" form than in
	   the "machine language that does lots of it directly" form; e.g., the
	   SOAR people, people doing LISP implementations on "conventional"
	   machines, etc..

	2) I don't think the analogy is appropriate.  For one thing, most
	   computers are - for better or worse - "conventional", so a better
	   analogy would be "sometimes it feels like all we do is debate the
	   production cars, when we should be concentrating on 'safety' cars",
	   e.g. cars with significantly new features to increase safety.

The second analogy raises some interesting questions:

	1) How much are people willing to spend to have safer cars?  Are they
	   willing to pay X amount of extra money for airbags?  (How much of a
	   slowdown are people willing to accept for some amount of additional
	   run-time checking?)

	2) How much safer does a particular feature make you?  (If 99% of
	   the errors a particular run-time check detects can be caught at
	   compile-time, how much safer are you if you catch the extra 1%?)

	3) How many problems does a feature have that reduce its desirability -
	   or even its contribution to safety?  For example, if there's a risk
	   of airbags detonating spontaneously, what is the risk of this
	   happening and how bad is it if it does?  (What are the chances that
	   the implementation of the environment, and all its checking, has
	   a bug in it, and how badly are you screwed if it is?)

> Absolutely.  I tried to quantify the cost of 432-style object
> orientation in my thesis.  In a nutshell, I concluded that that style
> of machine be from 1 to 4 times slower than a conventional
> unprotected architecture made of equivalent implementation
> technology.

Just out of curiosity:

How would this figure change with newer implementation technologies for
"conventional" machines (e.g.  more powerful optimizers, different
architectural style (for which you may read "the stuff that RISC implementors
talk about a lot"), etc.?

How would it change with newer implementations for protected architectures
(including some of the same newer technologies mentioned for "conventional"
machines, and also techniques such as compiling "protected architecture" code
into "conventional" code, or doing "quick" checks in hardware and trapping to
software checking code if they fail)?  (For all I know, some of these may not
be "newer" techniques; if so, how did that change the relative performance?)

How easy is it to apply some of those techniques to instruction-set
architectures such as the 432's, as opposed to applying them to, say, RISC
architectures and using other techniques to implement a "protected" system atop
such a "conventional" architecture?

> You could do their transparent multiprocessing and buy that factor of
> 4 back, but if you got above 6 or 7 you'd saturate the bus
> (memory-to-memory architecture).  Things get hard to sort out after
> that.

Yes, but could you buy an equivalent amount back with multiprocessing on
"conventional" architectures with the protection implemented elsewhere than in
the instruction set, even if the multiprocessing might be less transparent?
Could it be a greater amount if the "conventional" architecture were not so
memory-reference-intensive?

Could you change the "protected" system to make *it* less memory-intensive and
improve *its* multiprocessor performance?  (At this point, one would no longer
expect eyebrows to raise at the appearance of a new register-oriented
machine.... :-))

> Is that performance hit worth it?  Who knows?  I'd say it's probably
> worth it a lot more often than most people currently think.  An awful
> lot of code got written and debugged on machines that were a lot slower
> than the could-have-been 432.  You have to allow for a natural
> tendency (which I think largely fuels this whole industry) to always
> want more and to refuse to ever go backwards in anything, speed,
> memory, disk, sw complexity...

Yes, but there may be an economic justification for that tendency; the issues
may not all be "technical" in the narrow sense.  It may be hard, as you pointed
out, to show improvements in programmer productivity, code reliability, etc.
in a better programming environment, but if the advocates of such environments
want people to spend money on those sorts of improvements rather than straight
performance improvements, they should expect those who would pay the bills to
be more swayed by that kind of evidence.

(Another problem is that it may not even be cost-effective to adopt a
technology that's clearly "better", technically; I suspect a new motor fuel
would have to be quite a bit better before people were willing to write off the
existing investment in the production and distribution of existing fuels.
There are a lot of "conventional" machines out there.)

> Yeah, well, the Space Shuttle doesn't exactly have blazing hardware.
> But it's running some of the most sophisticated software I
> know of.  And telephone switchers aren't mini-crays by any means.

Yeah, well, I don't know that either of those processors would necessarily have
the requisite horsepower for some of the embedded applications somebody might
want to put computers into in the future.  That's why I said "as the jobs they
do get more sophisticated, the minimum performance required increases;" a 4x
performance hit relative to "conventional" computers may rule out a "protected"
processor for some application until you can either come up with a way of
solving the problem faster with existing hardware or make the hardware faster.

I'm certainly willing to believe that there are applications now that could use
a "protected" processor, but I'm not sure I'm willing to believe that there
aren't applications that, at present, couldn't use a "protected" processor if
it were 4x slower than a "conventional" one.  In this case, a useful question
to ask might be "can I get the same level of protection with an adequate level
of performance if I don't put all the protection in the instruction set?"

(BTW, I infer from the paper "Engineering a RISC Compiler System", by some
people at MIPS, that their compiler can optimize register usage across
procedure boundaries even when the calling and called procedures were
separately compiled - they keep Ucode around for multiple compilation units, so
that information for both calling and called procedure is available to the
optimizer - so the claim made in "Fast Object-Oriented Procedure Calls:
Lessons from the Intel 432" that "Clearly, the same approach (interprocedural
register allocation) is inapplicable to separately compiled modules." is
false.  It may be true in some cases - for example, if the modules are bound
together at run time - but it is hardly clear that it is always true.  I
believe David Wall has worked on link-time register allocation as well.)

colwell@mfci.UUCP (Robert Colwell) (07/28/88)

In article <61781@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes:
[Guy, how come you always delete the previous correspondent's name?
Are you trying to save me embarrassment?]

[lots of interesting questions and points to ponder removed...I think
this forum has run out of gas on a topic this complex, so in true
Usenet style I'm going to pick nits instead.  I'll try to get back to 
your 432 questions when I get a chance.]

>I'm certainly willing to believe that there are applications now that could use
>a "protected" processor, but I'm not sure I'm willing to believe that there
>aren't applications that, at present, couldn't use a "protected" processor if
>it were 4x slower than a "conventional" one.  In this case, a useful question
>to ask might be "can I get the same level of protection with an adequate level
>of performance if I don't put all the protection in the instruction set?"

If you think of the 432 as merely a "protected processor" you've not
fully grasped what they created there.  It was an *environment*, of
which the central processor was just a component.  You, as a
programmer, did not see the 432 programming environment as a
processor running with a certain amount of memory, a set of
registers, some available OS calls, a certain type of virtual memory,
etc.  You the programmer were given a new universe in which all the
pieces obeyed the same rules, and the rules were simple and few and
not derived from hardward or language artifacts (by and large,
anyway).  I do not believe you can get this environment piecemeal.
(But you can get closer to it than we currently are, and we probably
agree on both the feasibility and desireability of this.)

>(BTW, I infer from the paper "Engineering a RISC Compiler System", by some
>people at MIPS, that their compiler can optimize register usage across
>procedure boundaries even when the calling and called procedures were
>separately compiled - they keep Ucode around for multiple compilation units, so
>that information for both calling and called procedure is available to the
>optimizer - so the claim made in "Fast Object-Oriented Procedure Calls:
>Lessons from the Intel 432" that "Clearly, the same approach (interprocedural
>register allocation) is inapplicable to separately compiled modules." is
>false.  It may be true in some cases - for example, if the modules are bound
>together at run time - but it is hardly clear that it is always true.  I
>believe David Wall has worked on link-time register allocation as well.)

We were talking about procedure calls made within an object-based
programming environment, and I don't think the claim is false.  In
such an environment you do not want to tie your parameter-passing
abstraction, with its expected runtime type-checking, to whatever
register convention happens to be the current fashion.  We didn't say
you couldn't do it -- we meant you don't want to.

Bob Colwell            mfci!colwell@uunet.uucp
Multiflow Computer
175 N. Main St.
Branford, CT 06405     203-488-6090

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/29/88)

In article <2079@u1100a.UUCP> krohn@u1100a.UUCP (Eric Krohn) writes:
>In article <8239@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>] As a significant example of C's efficiency for graphics programming,
>] virtually all the code in the Blit, DMD (5620), and MTG (630)
>] bitmapped terminals was written in C, and their graphics operations
>] are extremely fast.  No self-modifying code was necessary.
>Virtually all?  Yes.  The most critical?  No.
>The bottom-most level of all screen updates is the infamous
>bitblt function.  A 1982 BTL memo by John Reiser tells about the on-the-fly
>code generation done by bitblt to achieve rather good performance on the
>original Blit.  I've assumed that similar code was carried forward to the 630
>and the 5620 (even with the CPU switch).

Rob tells me that the 5620 doesn't use this trick.  (There is a chance
that the 630 does; I don't know.)  In the Feb 1985 Pike & Locanthi paper
in SP&E (V15, p131-151) they explain how this trick was done.  They also
mention that the savings over their best C version average about 20%.
Of course, bitblt is "bottleneck code", so every little bit helps, but
on-the-fly generation of instructions isn't essential (the first three
Blit bitblt() implementations were all written in C).

That was admittedly a risky example; bottleneck code is often made
to use some trick to speed things up.

henry@utzoo.uucp (Henry Spencer) (07/30/88)

In article <477@m3.mfci.UUCP> colwell@mfci.UUCP (Robert Colwell) writes:
>... there is
>an awful lot the compiler can't know about -- programs that use input
>data, generate pointers, or do just about anything else that's
>interesting, are going to do many operations that are difficult for
>the compiler to validate at compile time.

My personal thesis is that the difficulty of doing this in a well-written
program -- I am not interested in poorly-written ones -- is consistently
overestimated.  Remember, the programmer is supposed to be sure that these
errors aren't going to happen, because even if they are caught immediately
they are generally a disaster for production code.  Frequently, the sort
of checks that a programmer must put in to be sure about such things are
the sort of thing that a compiler ought to be able to spot fairly easily,
e.g. validation of input data.

>I'm not sure I catch your drift on the imposition of runtime checks.
>To me, the best human debuggers have the ability to figure out the
>fastest which assumptions are being made that are wrong, thus getting
>to the heart of some problem the quickest.  If I have a program bug,
>I want the machine to express as directly as possible the intent of
>my program so that I can narrow the bug-space down to my code alone.

We may have a slight misunderstanding here, which I admit I didn't notice
in my original posting.  I'm not claiming that run-time checks aren't
useful during debugging.  However, I claim that run-time checks are not
a particularly good approach to catching problems during production use,
because "subscript error in line 32, core dumped" can be every bit as
bad as trash output in a production environment.  What one really wants
is a *guarantee* that (in the absence of hardware malfunction), it will
not happen.  This cannot be achieved by testing, regardless of how much
help the hardware provides.

>I sure hope you're right.  In fact, if progress towards this goal
>becomes evident, I'd propose we start discussing ways of turning
>architecture towards better ways of supporting this instead of
>throughput or programming environments.

This is actually a programming-environment issue, though, and architecture
is nearly irrelevant.  A much more important issue is the programming
languages being used, and how tractable they are.  C, for example, is not
a very favorable case -- too many problems with pointers.
-- 
MSDOS is not dead, it just     |     Henry Spencer at U of Toronto Zoology
smells that way.               | uunet!mnetor!utzoo!henry henry@zoo.toronto.edu

oconnor@nuke.steinmetz (Dennis M. O'Connor) (07/30/88)

An article by ge@hobbit.sci.kun.nl (Ge' Weijers) says:
] From article <1087@ficc.UUCP>, by peter@ficc.UUCP (Peter da Silva):
] > 	Why are an Icache plus a Dcache better than just
] > 	a big shared cache as big as both?
] 
] The answer is bandwidth. The CPU does not have to stop filling
] the instruction pipeline when it accesses/writes data.
] -- 
] Ge' Weijers, Informatics dept., Nijmegen University, the Netherlands

This is indeed part of the answer. The other part is that
instruction-fetch behavior is much simpler and more predictable
than operand fetch, and that I-caches don't need to be
written-through, allowing an isolated I-cache to be simpler
and more effective than a combined operand/instruction cache.

Specilized I-caches, like a Branch-Target cache combined with pre-fetch,
can produce effective hit rates of 99%+ with only a few thousand
bits of storage. Operand caches can't do anywhere near this well.
--
 Dennis O'Connor   oconnor%sungod@steinmetz.UUCP  ARPA: OCONNORDM@ge-crd.arpa
    "Never confuse USENET with something that matters, like PIZZA."

henry@utzoo.uucp (Henry Spencer) (07/30/88)

In article <479@m3.mfci.UUCP> colwell@mfci.UUCP (Robert Colwell) writes:
>But the whole point of capability machines (to name a single example)
>is that one cannot cover the space of all interesting
>exception-possibilities using only a compiler, no matter how smart.
>For one thing, the program could be coercing data types back and
>forth such that checking an operation on a type can only be done at
>the time the operation is applied.

Yes, it could be -- but how does the *programmer* know it's right?
Remember, programs are not supposed to dump core in production operation.
Remember also Dijkstra's comments on how human minds are actually fairly
simple and find it difficult to reason about complex situations.  There
are three possibilities here:

1. Code is simple enough for human programmer to figure out easily, in
	which case the compiler should be able to do likewise, perhaps
	with a bit of help.

2. Code is wrong, human programmer is (e.g.) not validating his inputs
	properly.  Compiler should reject it.

3. The code really is right but very subtle.  Programmer has to have a
	way to tell the compiler "this is subtle but right".  This should
	not be the default, as it effectively is now.

My conjecture, not proven at the moment, is that only modest intelligence
in the compiler and occasional tidying-up of code for clarity would be
needed to make case 3 fairly rare, given favorable cirumstances (it is
not clear that C constitutes a sufficiently favorable circumstance).

>But a more fundamental matter is how one manages the development of a
>large software project with dozens of programmers contributing to a
>single large end product.  The modules are all separately compiled,
>so there is no question of the compiler helping out much.

Please consider modern compiler technology, not 20-year-old versions.
Ways to achieve inter-module checking are a major concern in modern
languages, as witness (e.g.) C++ and Modula 2.  It can be done, and
ought to be done!

>Given access to all the symbol tables, you could imagine the linker
>doing some reasonable checks of consistency (number and types of
>args, for instance), but even that fails when pointers are being
>passed (pointers to functions, even).

If the assumptions being made about those pointers are simple ones, they
should be in the code, for compilers (and humans!) to see.  If they are
complex ones, then in a multi-programmer project they are probably being
violated already.
-- 
MSDOS is not dead, it just     |     Henry Spencer at U of Toronto Zoology
smells that way.               | uunet!mnetor!utzoo!henry henry@zoo.toronto.edu

mcdonald@uxe.cso.uiuc.edu (07/30/88)

>This is indeed part of the answer. The other part is that
>instruction-fetch behavior is much simpler and more predictable
>than operand fetch, and that I-caches don't need to be
>written-through, allowing an isolated I-cache to be simpler
>and more effective than a combined operand/instruction cache.

This statement, unfortunately, connects up with the recurrent self-
modifying-code theme. SEPARATE I and D caches can result in incorrect
program execution if entries in the I cache are not voided if the
D cache gets written at the same location. Fixing this requires either
hardware to do this, or a "flush-cache" instruction. I strongly
prefer that hardware do this sort of housekeeping. It is OK if
there is a "dead period" of a reasonable number of cycles or
instructions during which the I cache is wrong. Some machines,
however, seem to be able to get this down to zero.

Doug McDonald

mike@arizona.edu (Mike Coffin) (07/31/88)

From article <1988Jul29.202400.28068@utzoo.uucp>, by henry@utzoo.uucp (Henry Spencer):
> ...  There are three possibilities here:
> 
> 1. Code is simple enough for human programmer to figure out easily, in
> 	which case the compiler should be able to do likewise, perhaps
> 	with a bit of help.
> 
> 2. Code is wrong, human programmer is (e.g.) not validating his inputs
> 	properly.  Compiler should reject it.
> 
> 3. The code really is right but very subtle.  Programmer has to have a
> 	way to tell the compiler "this is subtle but right".  This should
> 	not be the default, as it effectively is now.

4. Code is simple enough for a human programmer to figure out easily
given "deep" knowledge about what the program is doing, in which case
a compiler has big troubles figuring out almost anything.  I have
worked on some mathematical software --- computational group theory,
if it matters --- where the code itself looks broken unless you know
the right theorem.  The theorem establishes an invariant that makes
the code very easy to understand.

I just thought of a simple example: a piece of code relies on the fact
that an array always contains a permutation of the integers 1:N.  This
invariant is originally established by generating a random permutation
via a very clever algorithm (Floyd's), and then maintained by always
permuting the integers --- sometimes swapping them, other times
rotating several.  A procedure that relies on the fact that the array
is a permutation is going to be extremely opaque to the compiler, yet
might be transparent to a programmer.  (Especially if the array is
called "permutation".)

-- 

Mike Coffin				mike@arizona.edu
Univ. of Ariz. Dept. of Comp. Sci.	{allegra,cmcl2,ihnp4}!arizona!mike
Tucson, AZ  85721			(602)621-4252

henry@utzoo.uucp (Henry Spencer) (08/04/88)

In article <6490@megaron.arizona.edu> mike@arizona.edu (Mike Coffin) writes:
>4. Code is simple enough for a human programmer to figure out easily
>given "deep" knowledge about what the program is doing, in which case
>a compiler has big troubles figuring out almost anything...

Well, remember, I'm not talking about proof of correctness; all I want
is absence of runtime errors.  Even so, I concede that this sort of thing
is likely to happen now and then.  Having to turn the checking off once
in a while (and, preferably, having to justify this in writing) still
strikes me as better than not having it at all.
-- 
MSDOS is not dead, it just     |     Henry Spencer at U of Toronto Zoology
smells that way.               | uunet!mnetor!utzoo!henry henry@zoo.toronto.edu

atbowler@watmath.waterloo.edu (Alan T. Bowler [SDG]) (08/15/88)

In article <61781@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes:
>
>What if they look at the microcode, or directly at the hardware (if the
>underlying machine isn't implemented using microcode)?  In order for the
>environment to be truly airtight, the microcode (or hardware) has to be
>airtight.
>
>It may be that producing a sufficiently airtight compiler that implements the
>object-oriented model by producing the "right" code, including checks, is
>either impossible or more costly than producing a microcode or hardware
>implementation atop a compiler that doesn't have to be so airtight because the
>microcode or hardware is.  If people's experience points in this direction,
>fine.
>
Perhaps this is the wrong question.  There is also the danger that the
existence of the manditory checks will increase program complexity
as the programmer finds that he has to program arround the environment
to accomplish the task.  A classic example are those Pascal programs
that allocate 1 large static linear array and then run their own
memory allocator withing it.  Instead of machine pointers they
pass arround indexes into the array.  This has defeated the purpose
of the Pascal tight checking, reintroduced all the old danger of
running off the end of your allocated memory (into a chunk logically
attached to another pseudo pointer), and increased the notational complexity
of the program (and probability of errors).  If you examine these programs
and what they accomplish, you will usually conclude 
  1) This was the best choice that could be made given the constraints
     of the language
  2) The programmer must have been persistent to get the thing working
     at all.

lisper-bjorn@CS.YALE.EDU (Bjorn Lisper) (08/22/88)

In article <20349@watmath.waterloo.edu> atbowler@watmath.waterloo.edu (Alan
T. Bowler [SDG]) writes:
>.....  A classic example are those Pascal programs
>that allocate 1 large static linear array and then run their own
>memory allocator withing it.  Instead of machine pointers they
>pass arround indexes into the array.  This has defeated the purpose
>of the Pascal tight checking, reintroduced all the old danger of
>running off the end of your allocated memory (into a chunk logically
>attached to another pseudo pointer), and increased the notational complexity
>of the program (and probability of errors).  If you examine these programs
>and what they accomplish, you will usually conclude 
>  1) This was the best choice that could be made given the constraints
>     of the language
>  2) The programmer must have been persistent to get the thing working
>     at all.

Just a comment on this....this is the typical technique to implement
pointers and dynamic memory handling in Fortran, that to the contrary of
Pascal doesn't have pointer types and dynamic memory allocation as language
primitives. Are "those Pascal programs" maybe translated Fortran programs?

Bjorn Lisper

Robert_P_Rubendunst@cup.portal.com (08/30/88)

Every operating system that loads code into RAM and then executes it is
that nasty old self modifying code.

Every cache on every computer should be flushable by CPU instruction.

rdubey@eecs.wsu.edu (Rakesh Dubey - grad student) (10/09/90)

>>> svissag@hubcap.clemson.edu
>>> Re: Two Questions.

About declaring the structure, it is possible but I can see no
reason why you want to do that. Things like self-modifying code
are simply not worth it. How would you debug/understand/prove
a code that modifies itself. And you are trying a higher level
programming language, would you modify the C source code statements
and recompile everything. 
Coming back to the original point, this is the way you *can* do it.
But it sure is stupid. 
1. Read the name of the structure and kind of things the user wants
to be in it. (Range, type, limits etc.)
2. Creat a file containing C source code which defines such a structure.
3. Now exec a Makefile which compiles your source code file and
this brand new one you created just now. You will finally have a new
process running.
This might be called self-modifying higher level code and many extensions
come to mind. But there will be bunch of problems too. Though I *think*
you can do it, but you shouldn't be needing such tricky/unreliable
kind of stuff.
-- 

---
Rakesh Dubey
rdubey@yoda.eecs.wsu.edu

peter@neccan.oz (Peter Miller) (10/09/90)

in article <1990Oct08.183922.12541@eecs.wsu.edu>, rdubey@eecs.wsu.edu (Rakesh Dubey - grad student) says:
> Though I *think* you can do it, but you shouldn't be needing
> such tricky/unreliable kind of stuff.

I have used self-modifying code on a number of occasions,
and while I don't advise its use as a general rule,
I think it has a place in every programmers box of tricks.
Examples:

1. On a Z80 I wrote some code which used a NMI (non-maskable interrupt).
   The problem was that the Z80 has no 16-bit indirect load opcode,
   only 8-bit indirect, and (for various reasons which would take ages to
   describe) it was not avoidable to do the 16-bit indirect load.
   Murphy's Law sprang into play, (I was doing the 16-bit indirect as 2 8-bit
   indirects) and the NMI, every few hours, fell beween the 2 8-bit indirects,
   giving an impossible result and causing all sorts of weird behaviour.
   I solved it by declaring an array of 4 chars and setting it to the absolute
   load opcode, the 16-bit address, and an rts opcode.  The I cast the array
   pointer to a function pointer and called it.  Viola, 16-bit indirecting
   function.
	char barf[4];
	barf[0] = opcode ld hl,N
	barf[1] = addr;
	barf[2] = addr>>8;
	barf[3] = opcode rts
	result = (*(char *(*)())barf)();

2. On a pseudo-emacs editor I wanted real regular expressions,
   So I wrote a RE compiler in Lisp which produced lisp as output,
   and promplty executed that output.  Lisp makes self-modifying code easy.

3. A very long time ago I used an apple ][ (skeleton rattles).
   Disassembling the disk driver is enlightening.  The disk io needed every
   last cycle from the machine, and so the code first calculated the hardware's
   memory addresses and poked it into the absolute load instructions for the
   rest of the disk io code, thus getting better performance from the
   code (abs, rather than register indexed) and freeing up a register, too.

4. At some point, I realized that using a compiler is rather like
   self-modifying code.  The compiler, itself a binary data file, chews on a
   text file and makes a binary data file.  When we run the program we just
   compiled, we are asking the OS to load a binary data file and leap into it.

Regards
Peter Miller	UUCP	uunet!munnari!pdact.pd.necisa.oz!pmiller
/\/\*		CSNET	pmiller%pdact.pd.necisa.oz@au
		ACSnet	pmiller@pdact.pd.necisa.oz
Disclaimer:  The views expressed here are personal and do not necessarily
	reflect the view of my employer or the views or my colleagues.
D

jc@atcmp.nl (Jan Christiaan van Winkel) (10/12/90)

From article <829@neccan.oz>, by peter@neccan.oz (Peter Miller):
> 1. On a Z80 I wrote some code which used a NMI (non-maskable interrupt).
This reminds me of the code used in the 8080 basic interpreter by Microsoft.
They had several entries into the errorroutine. The errorroutine expected
an errornumber in register b. Now what they had done was:
ld hl,<some number>       ; registerpair hl gets the value <some number>
ld hl,<some other number>
ld hl,<some other number>
and so on.
The 16 bit numbers themselves were actually instructions:
ld b,errorcode

By jumping into the middle of one of the ld hl,... instructions, they would
load the errorcode in b, and then execute some dummy ld hl,... instructions.
that would not globber the value in b, eventhough the ld b,xxx instructions
were just a byte away.
Although this is not self modifying code, it is 'shifting the bits a bit and
interpreting the result'. Very clever

> 4. At some point, I realized that using a compiler is rather like
>    self-modifying code.  The compiler, itself a binary data file, chews on a
>    text file and makes a binary data file.  When we run the program we just
>    compiled, we are asking the OS to load a binary data file and leap into it.
Hmmm. I think you should read Ken Thompson's Turing award lecture. He dis-
cussed the possibility of getting code into a C compiler, without having it
in the source. The trick is illustrated with the addition of a new escaped
character like \n. In the lex. analyzer there is some sort of code like this:
case '\': switch(getnewchar()) {
   case 'n': return '\n';
   case 'a': return '\007';    /* the newly added character */
			       /* my name's Bond, James Bond :-) */
   .
   .

Now compile the compiler, and you'll have a new compiler that recognizes '\a'.
Now edit the sourcecode to look like this:
    case 'a': return '\a'     

Tghis is possible because the compiler will be compiled with the compiler that
knows about '\a'. The result is a C compiler that knows that '\a' is in
reality '\007', but nowhere in the source of the C compiler that knowledge
is stored. It is inherited from the previous generation of the C compiler.

JC
-- 
___  __  ____________________________________________________________________
   |/  \   Jan Christiaan van Winkel      Tel: +31 80 566880  jc@atcmp.nl
   |       AT Computing   P.O. Box 1428   6501 BK Nijmegen    The Netherlands
__/ \__/ ____________________________________________________________________