[comp.sys.m88k] Proposed Change to memctl

jkf@Franz.COM (Sean Foderaro) (12/15/90)

 Hello,
 
    I work at Franz Inc. porting Allegro Common Lisp to various architectures.
I've run into a problem making our 88k version of Lisp compliant with
the 88open standard.  I believe that the problem lies in the 88open standard
and I'd like to see that standard changed.  While in theory a single person
with a reasonable argument should be able to effect a change, in practice
I suspect that it will actually need a large group of people in favor
of the change for the 88open consortium to take notice.  Thus I'm
soliciting people to join with me to change the standard.

    This is the problem:  a memory page can only be in one of three
states: (1) read-execute,
        (2) read-write,
	(3) read
	
Note that read-write-execute is not an option, and for Lisp we need
read-write-execute (at least in a limited form) for many of our data pages.

 Let's examine current practice among computer architectures (for it
was current practice that let us to design a memory allocation scheme that
turned out to be incompatible with the 88open spec).  Based on our
current Lisp ports:

Systems that permit read-write-execute:
    68010, 68020, 68030
    vax
    386
    370
    cray
    cray 2
    sparc
    Aviion (88k)  [In our experience we don't need to cache flush on this
		   machine, others may have different experiences]
		   
Systems that permit read-write-execute but require the user to explicitly
cache flush:

    mips
    rs/6000
    motorola 188 board (88k)
    
Systems that don't permit read-write-execute at all:
    motorola 181 board (88k) [follows official 88open spec]



 The 88open consortium has taken a bold stand by
refusing to support a mode of operation that its competitors support
(and in fact that nearly every machine before it has supported.)
Note that some 88k systems support read-write-execute but because
this isn't a specified 88open extension no program that depends on
that extension be certified by the 88open process.

 You might ask why we don't just change our Lisp to obey the 88open spec.
Part of the reason is that it would be a lot of work.  We have a very
complex storage management/garbage collection scheme that has taken
many man-years to write.  It treats executable code like any other
data objects and it garbage collects code that will never be referenced
again.   Changing this collector so that it segregates code vectors from
other data objects would add another layer of complexity around what is
already a very complex piece of code.  Such added complexity would just make
things worse on our other ports.  And furthermore we don't see a huge number of
88k machines that require this change (in fact we've only come across
one old-style Motorola development board that requires compliance
with this part of the 88open spec).
   A second reason we don't like this part of the spec is that 88open
is setting a dangerous precedent (new computers are supposed to increase
functionality not reduce it!).  We don't want the functionality of
future system compromised by restrictions like this one as it will inhibit
what software can be designed.

  What we propose:

    We don't need fine-grain read-write-execute where you write a word
  in memory and then jump to it and execute it.

    We propose a very coarse-grain read-write-execute that just forces
  the instruction and data caches to synchronize when this action
  is explicitly requested by the program.

    For those with access to the memctl() manual page the change is
    exactly this:
  
    To the table of states, add a new state,
        state 4, which is readable, writeable and executable.
  

    To the text description add:
    
        After memctl() is called with a state argument of 4,
    the region is readable, writeable and executable.  However if any byte
    within a (4 byte) word in this region is written, then the
    subsequent execution of that word is undefined.


  The change implies that in order to write code into memory and then
  execute it, you must call memctl with a state of 4 between the time
  the code is written to memory and the time it is executed.  Thus we
  don't need the instruction cache snooping the data cache.  All we
  need is for the operating system to be able to flush a region of
  the data cache to memory and to be able to clear a region of the
  instruction cache.  I think that every version of Unix that does
  paging will already have this capability.


I'd like those who support this change to contact me (via email,  phone,
or comp.sys.m88k). Please mention why you want it (e.g. you have an application
that requires it or you simply believe that it is correct).

I'd also like to hear from hardware and operating system people who believe
that implementing this would greatly slow down or otherwise hamper programs
running on the 88k, even if they didn't use read-write-execute pages.

 

- John Foderaro		voice: 415 548 3600
  Franz Inc		  fax: 415 548 8253
		        email: jkf@franz.com   -or- uunet!franz!jkf

jkf@Franz.COM (Sean Foderaro) (12/20/90)

 
>> From: hamilton@dg-rtp.dg.com (Eric Hamilton) 
>>  How about a new trap:
>>	r2 contains the base address
>>	r3 contains the length
>>
>>	tb0 0,r0,<CacheSynchronizationTrap>

>> Will cause the data and instruction caches for the specified region (between
>> r2 and r2+r3-1, byte granular, no minimum length) to come into coherence,
>> so that that region can be safely executed.


  I support your request for the trap to sychronize a region of the
address space.  However unless the memctl change I propose is
passed,  there isn't a need for the trap since it will still be
illegal to have read-write-execute pages.   

  My immediate concern is getting this rather obvious memctl() flaw
fixed in the spec and thus I proposed something with very little 
implementation cost.   The 88open folks tell me that even this little
change will have a hard time getting passed.  So I'm concerned that if
I start asking for more (like the trap you suggested), that it won't
have a chance of getting passed.  

  The primary reason the 88open people gave for the current state of
memctl() is that multiprocessor systems couldn't handle the memctl()
change I proposed.  As you are familiar with 88k multiprocesor systems
can you tell me if this is true?  Is this anything (that you can
reveal) about future versions of the 88k chipset that will make this
true?   Could you imagine anyone building there own caching system
(not the 88200) that support a paging version of Unix yet which 
can't support the memctl() change?

-john foderaro
 franz inc.

hamilton@siberia.rtp.dg.com (Eric Hamilton) (12/22/90)

In article <JKF.90Dec19092258@frisky.Franz.COM>, jkf@Franz.COM (Sean Foderaro) writes:
|> 
|>   I support your request for the trap to sychronize a region of the
|> address space.  However unless the memctl change I propose is
|> passed,  there isn't a need for the trap since it will still be
|> illegal to have read-write-execute pages.   
|>
I suspect that our postings have crossed in the mail....

I've addressed this point in a "Read/write/execute Proposal"
which I posted to comp.sys.m88k.  And, yes, I agree that the memctl()
change or something similiar is needed *as well as* the cache synchronizing
trap.

|>   The primary reason the 88open people gave for the current state of
|> memctl() is that multiprocessor systems couldn't handle the memctl()
|> change I proposed.  As you are familiar with 88k multiprocesor systems
|> can you tell me if this is true?  Is this anything (that you can
|> reveal) about future versions of the 88k chipset that will make this
|> true?   Could you imagine anyone building there own caching system
|> (not the 88200) that support a paging version of Unix yet which 
|> can't support the memctl() change?
|> 
There has to have been a misunderstanding somewhere along the line.

It is true that, in general, the cache synchronization trap must act on
all processors' caches, but that certainly doesn't make it unimplementable.
In fact, the cache synchronization trap does exactly the same thing that
the kernel must do, in general, when paging in any executable pages
(assuming that the instruction caches don't snoop - a good assumption).

This one is technically feasible and not even especially difficult,
on uniprocessor and multiprocessor systems, with all 88000s
that I am aware of.  I cannot imagine designing a caching subsystem
for which this would not be the case.