[comp.arch] Instruction

roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (08/26/89)

I've noticed that motorola has moved from instruction continuation
(68010-30) to instruction restart (68040). So they no longer support
virtual machines. (Must be the processors got tired of puking their
insides all over the stack. :-)

Quoting the 68030 manual:

Instruction continuation is used to support virtual I/O devices in
memory-mapped input/output systems. Control and data registers for
the virtual are simulated in the memory map. An access to a virtual
register causes a fault and the function of the register is emulated
by software.

Anybody know why this instruction discontinuation? 

Is virtual machine emulation a lovely idea whose time has come and gone? 

Or does it use too many hardware resources?

I think some (all?) of the risc processors use instruction restart (mips if
I remember correctly) so are we looking at the end of instruction continuation?


-- 
I don't know what the question means, but the answer is yes...
KLM - Koninklijke Luchtvaart Maatschappij => coneenclicker lughtfart matscarpie
Roelof Vuurboom  SSP/V3   Philips TDS Apeldoorn, The Netherlands   +31 55 432226
domain: roelof@idca.tds.philips.nl             uucp:  ...!mcvax!philapd!roelof

tim@cayman.amd.com (Tim Olson) (08/28/89)

In article <231@ssp1.idca.tds.philips.nl> roelof@idca.tds.PHILIPS.nl (R. Vuurboom) writes:
| I've noticed that motorola has moved from instruction continuation
| (68010-30) to instruction restart (68040). So they no longer support
| virtual machines. (Must be the processors got tired of puking their
| insides all over the stack. :-)
| 
| I think some (all?) of the risc processors use instruction restart (mips if
| I remember correctly) so are we looking at the end of instruction continuation?

Well, when most (if not all) of the instructions execute in a single
cycle, instruction continuation and instruction restart look pretty
much the same.  Especially in a load/store architecture where there
are fewer instruction side-effects.

The 29K uses instruction restart for all instructions except for loads
and stores, which cannot be restarted (in the absolute sense) because
they execute in parallel with subsequent instructions. Instead, these
instructions are continued from on-chip state registers.  Loadm (Load
Multiple) and storem (Store Multiple) are continued from the last
completed transfer if interrupted.

I think you will find this mix of restart and continuation in many
processors which have simple instructions and parallel functional
units with out-of-order completion.

	-- Tim Olson
	Advanced Micro Devices
	(tim@amd.com)

shebanow@oakhill.UUCP (Mike Shebanow) (08/28/89)

In article <231@ssp1.idca.tds.philips.nl> roelof@idca.tds.PHILIPS.nl
(R. Vuurboom) writes:

>I've noticed that motorola has moved from instruction continuation
>(68010-30) to instruction restart (68040). So they no longer support
>virtual machines. (Must be the processors got tired of puking their
>insides all over the stack. :-)
>
>Quoting the 68030 manual:
>
>Instruction continuation is used to support virtual I/O devices in
>memory-mapped input/output systems. Control and data registers for
>the virtual are simulated in the memory map. An access to a virtual
>register causes a fault and the function of the register is emulated
>by software.

You can still emulate virtual machines using instruction restart.  All 
you have to do is simply interpret the instruction which faulted :-\
That is, when the machine takes the exception, the stack frame will point
to the offending instruction.  At that point, software can interpret
this instruction and perform the intended operation.  The only change is
that software has to do all the work, not just part of it.

Mike Shebanow

------------------------
Disclaimer: The opinions I have presented here are my own, not Motorola's.

scott@bbxeng.UUCP (Engineering) (08/28/89)

   In article <2345@oakhill.UUCP> shebanow@oakhill.UUCP (Mike Shebanow) writes:
    In article <231@ssp1.idca.tds.philips.nl> roelof@idca.tds.PHILIPS.nl
    (R. Vuurboom) writes:

    >I've noticed that motorola has moved from instruction continuation
    >(68010-30) to instruction restart (68040). So they no longer support
    >virtual machines. (Must be the processors got tired of puking their
    >insides all over the stack. :-)
    >
    >[...]

    You can still emulate virtual machines using instruction restart.  All 
    you have to do is simply interpret the instruction which faulted :-\
    That is, when the machine takes the exception, the stack frame will point
    to the offending instruction.

Forgive me for showing my ignorance, but, doesn't instruction continuation
enable features such as dynamic stack allocation?  Are we doomed to
return to the antiquated "stack probe"?  Does this mean that 68030
(user mode) software will not always work correctly on the 68040?
What about page faults?  Is the operating system *really* expected 
to include an instruction set interpreter so it can simulate
instruction continuation?  The 386 is suddenly starting to look good
me.

-- 

---------------------------------------
Scott Amspoker
Basis International, Albuquerque, NM
505-345-5232

lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (08/29/89)

In article <204@bbxeng.UUCP> scott@bbxeng.UUCP (Scott-Engineering) writes:
>Forgive me for showing my ignorance, but, doesn't instruction continuation
>enable features such as dynamic stack allocation?  Are we doomed to
>return to the antiquated "stack probe"?  Does this mean that 68030
>(user mode) software will not always work correctly on the 68040?
>What about page faults?  Is the operating system *really* expected 
>to include an instruction set interpreter so it can simulate
>instruction continuation?

Well, no. Perhaps you are confusing "instruction continuation" with
"program continuation".

A normal interrupt can be ignored for a tiny amount of time. So, for
convenience, the processor will ignore an interrupt request until
the processor happens to be between instructions.

A page fault interrupt isn't like that. The instruction in progress
cannot go forward: it wants to write to a page that is out on disk
(or whatever). The interrupt has to be honored at once, and the
instruction is not completed. The operating system is invoked. The OS
does good stuff (like disk I/O) and eventually decides to let the
user program resume.  But resume where in the program? And with what
register contents, what processor state?

If the hardware has been designed to do "instruction continuation",
then the user program will resume somewhere in the middle of the
offending instruction. If the hardware has been designed for
"instruction restart", then the program will be resumed at the start
of the offending instruction. The user-visible result is the same in
both cases. 

The fun stuff comes in actually **implementing** either of these
schemes. For example, suppose the following instruction:

	load two words from @ro into r0 and r1.

What if the two words lie across a page boundary?  Hmmm!
-- 
Don		D.C.Lindsay 	Carnegie Mellon School of Computer Science

scott@bbxeng.UUCP (Engineering) (08/29/89)

In article <5990@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes:
>
>If the hardware has been designed to do "instruction continuation",
>then the user program will resume somewhere in the middle of the
>offending instruction. If the hardware has been designed for
>"instruction restart", then the program will be resumed at the start
>of the offending instruction. The user-visible result is the same in
>both cases. 
>
I guess this is where I'm having a problem.  What if the instruction
involved address increment/decrement modes?  Restarting the instruction
might not give the exact same result unless the results of auto-inc/dec
were not placed into the affected registers until the instruction completes.

I remember reading some literature when the 68010 came out explaining the
wonderful benefits of instruction continuation and why instruction restart
did not always solve the problem.  (I don't remember *where* I read this.)
Now I'm hearing that it doesn't really matter.  Instruction restart makes
a lot more sense to me as long as the side effects of the instruction are
not not interruptable.  Is this the case with the 68040?

-- 

---------------------------------------
Scott Amspoker
Basis International, Albuquerque, NM
505-345-5232

mash@mips.COM (John Mashey) (08/29/89)

In article <5990@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes:
>In article <204@bbxeng.UUCP> scott@bbxeng.UUCP (Scott-Engineering) writes:
>>Forgive me for showing my ignorance, but, doesn't instruction continuation
>>enable features such as dynamic stack allocation?......
.....
>Well, no. Perhaps you are confusing "instruction continuation" with
>"program continuation".
.....
>If the hardware has been designed to do "instruction continuation",
>then the user program will resume somewhere in the middle of the
>offending instruction. If the hardware has been designed for
>"instruction restart", then the program will be resumed at the start
>of the offending instruction. The user-visible result is the same in
>both cases. 
>
>The fun stuff comes in actually **implementing** either of these
>schemes. For example, suppose the following instruction:
>
>	load two words from @ro into r0 and r1.

I think this last must have meant @r0 into r0 and r1.

All of this is why most RISC machines:
	a) Use load/store architectures, with zero (or very few)
	side-effects.
	b) Generally require loads/stores to access aligned data
	objects, or (more generally), at least forbid any kind
	of load/store from crossing a boundary.
	c) Usually do instruction-retart, or something close.
	Note that restart-vs-continue is not a binary decision.
	Some CPUs that mostly do restart may have some flavor of
	continuation in certain cases, i.e., with imprecise exceptions,
	and sometimes with branch-delay-slot things, or with
	emulation of missing FPUs, etc, etc.
	The more fundamental issue is how much state does it take
	to figure out where you were and get back there.  At the
	minimum, this is just a PC.  At worst, the processor dumps at
	lot of mysterious stuff somewhere.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

dswartz@bbn.com (Dan Swartzendruber) (08/29/89)

Clearly it has to make some difference, given that the 680x0 processors
support auto-increment/decrement of address registers!  The PDP-11 had
the same problem.  I seem to recall they solved it by having a diagnostic
register in which the CPU wrote which registers had been incremented or
decremented and by how much.  That wasn't as bad as it might first seem.
There are only two registers which can change as a result of any given
instruction and they could only change by 1, 2 or 4.  It's been a while
since I hacked on a PDP-11, so I might be off a little here, but that
was the basic gist....  Maybe the 68040 does something similar?  It
certainly can't be any uglier than the sh*t the current processors have
with eight gazillion different types of fault frames which can change
incompatibly as the microcode is updated....

lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (08/29/89)

In article <205@bbxeng.UUCP> scott@bbxeng.UUCP (Scott-Engineering) writes:
>I guess this is where I'm having a problem.  What if the instruction
>involved address increment/decrement modes?  Restarting the instruction
>might not give the exact same result unless the results of auto-inc/dec
>were not placed into the affected registers until the instruction completes.

Sorry, I should have stated explicitly the strong requirement that is
placed on the OS:

	The user program must not be able to notice that anything
	happened (except in second-order ways, such as the time of
	day jumping ahead).

Notice I said the OS. That's because the hardware doesn't necessarily
do it all. It is fine if the processor merely leaves enough
information so that the OS can sort things out. For instance, on the
old PDP-11/45, there were special registers, which recorded whether
the interrupted instruction had autoincremented any registers, and if
so, which ones. First, the OS would copy the user's register set to
some convenient place in memory. Then, using this special
information, the OS would undo any incrementation. Later, the values
in memory would be reloaded, and the offending instruction would be
restarted. The instruction would "see" the same values that it saw
the previous time.

Alternatively, the hardware can have "shadow registers". At the
beginning of each instruction, they are made equal to the normal
registers. If the instruction faults, then simply store the shadow to
memory, instead of storing the "foreground" register set to memory.
This nicely avoids the undo problem, and replaces it with lots and
lots of silicon. Or, as you suggested, the register updates can be
postponed until the end of the instruction. On most machines this
would be simpler but slower.

-- 
Don		D.C.Lindsay 	Carnegie Mellon School of Computer Science

shebanow@oakhill.UUCP (Mike Shebanow) (08/29/89)

In article <204@bbxeng.UUCP> scott@bbxeng.UUCP (Scott-Engineering) writes:
>
>   In article <2345@oakhill.UUCP> shebanow@oakhill.UUCP (Mike Shebanow) writes:
>    In article <231@ssp1.idca.tds.philips.nl> roelof@idca.tds.PHILIPS.nl
>    (R. Vuurboom) writes:
>    
>    >I've noticed that motorola has moved from instruction continuation
>    >(68010-30) to instruction restart (68040). So they no longer support
>    >virtual machines. (Must be the processors got tired of puking their
>    >insides all over the stack. :-)
>    >
>    >[...]
>    
>    You can still emulate virtual machines using instruction restart.  All 
>    you have to do is simply interpret the instruction which faulted :-\
>    That is, when the machine takes the exception, the stack frame will point
>    to the offending instruction.
>
>Forgive me for showing my ignorance, but, doesn't instruction continuation
>enable features such as dynamic stack allocation?  Are we doomed to
>return to the antiquated "stack probe"?  Does this mean that 68030
>(user mode) software will not always work correctly on the 68040?
>What about page faults?  Is the operating system *really* expected 
>to include an instruction set interpreter so it can simulate
>instruction continuation?  The 386 is suddenly starting to look good
>me.
>
>---------------------------------------
>Scott Amspoker
>Basis International, Albuquerque, NM
>505-345-5232

Sorry about the long reply.

I believe (but I am willing to be proved wrong) that anything that can
be done using instruction continuation can also be done using restart.
This includes dynamic stack allocation.  Using that as an example, when
a stack overrun occurs (decrements below allocated memory), a page fault
will occur.  In the restart model, the offending instruction will be
undone.  In general, most machines using restart will store exception
cause information in supervisor visible registers.  This will indicate
why the exception occurred (MMU fault), what happened (translation not valid -
page fault), where it happened (some virtual address - In Unix for example,
a stack fault would be obvious by inspection of the address) and other
pertinent information.  Once the operating has determined the cause, it
can allocate new memory and simply restart the instruction. Assuming that
there are no other problems with the offending instruction, all will
proceed as normal.

In response to your question about interpreters in operating systems,
no, I don't think an operating using a restart machine needs to have
a built-in interpreter.  Page faults, for example, would be handled in
a manner similar to the way stack faults are handled: the fault gets
logged in hardware registers and restart is used to reexecute the faulting
instruction once the page has been swapped in.  The OS doesn't need any
more detailed information than that.  

So when is an interpreter necessary?  If a complete virtual machine
is to be emulated, and such a machine includes such things as virtual
memory mapped I/O devices, then interpretation may be necessary.
For example, assume that some type of I/O device is mapped into user
memory.  The OS wants the user to be able to read the device normally, but
if a write is attempted, some other action should happen.  BUT, the OS
wants the user program to think that it HAS written the device.  In
certain machines (which support memory-memory operations), a read may have
happened to the device and before the write part of the instruction attempts
to write the device.  If the reading the device is destructive (meaning
you can only read it once), you cannot use restart on the instruction.
If you did, you would read the device twice.  In this particular case, it
might be necessary for the OS to complete the instruction on behalf of
the hardware.

The only real difference between a restart machine and a continuation machine
is (a) how much work needs to be done by hardware to save enough
state so that instruction restart or continuation is possible and (b)
how much work needs to be redone once the problem is corrected.

Mike Shebanow
--------------
Disclaimer: the opinions presented here are my own, not Motorola's.

paul@taniwha.UUCP (Paul Campbell) (08/29/89)

In article <204@bbxeng.UUCP> scott@bbxeng.UUCP (Scott-Engineering) writes:
>
>Forgive me for showing my ignorance, but, doesn't instruction continuation
>enable features such as dynamic stack allocation?  Are we doomed to
>return to the antiquated "stack probe"?  Does this mean that 68030
>(user mode) software will not always work correctly on the 68040?
>What about page faults?  Is the operating system *really* expected 
>to include an instruction set interpreter so it can simulate
>instruction continuation?  The 386 is suddenly starting to look good
>me.

No - the need for 'stack probe' etc was caused by the 68000 (fixed in
the 68010) which couldn't restart (either restart or continue) from
a bus error (page fault) [it didn't keep enough information around in
its bus error stack frame to tell what it had done, or undo it itself
while delivering the bus error]. Some early vendors actually had 2 68000s,
one which executed the user mode code and was halted in mid instruction
while the other was started to fix the problem ..... thus preserving the
internal state ....

I'm sure Motorola won't make this mistake again .....

	Paul

-- 
Paul Campbell    UUCP: ..!mtxinu!taniwha!paul     AppleLink: D3213
"Free Market": n. (colloq.) a primitive fertility goddess worshipped by an
obscure cult in the late 20th C. It's chief priest 'Dow Jones' was eventually
lynched by an enraged populace during an economic downturn (early 21st C).

scott@bbxeng.UUCP (Engineering) (08/29/89)

In article <5995@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes:
 
    Sorry, I should have stated explicitly the strong requirement that is
    placed on the OS:
 
 	   The user program must not be able to notice that anything
 	   happened (except in second-order ways, such as the time of
 	   day jumping ahead).
 

Just as long as we don't have to go back to stack probes.

-- 

---------------------------------------
Scott Amspoker
Basis International, Albuquerque, NM
505-345-5232

rajivp@sunshade.Sun.COM (Rajiv Patel) (08/30/89)

In article <26418@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
>	The more fundamental issue is how much state does it take
				      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>	to figure out where you were and get back there.  At the
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>	minimum, this is just a PC.  At worst, the processor dumps at
>	lot of mysterious stuff somewhere.
>-- 
>-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>

  To further the discussion on this issue ....

  There has been lots of discussion about superscalar and VLIW architectures.
  How about instruction continuation and/or restart for these architectures.
  It seems that a N way super-scalar machine would need N PC's and a lot of
  state information. For VLIW machines it seems that instruction continuation
  might be better for restart might first need to roll back partially completed
  operations. 

 Rajiv Patel.

srg@quick.COM (Spencer Garrett) (08/30/89)

Instruction restart on a CISC can cause grevious problems when there
are side effects to reading or writing certain addresses.  Consider,
for instance, the following instruction which reads a device register
and stores it into memory.  (This is 68xxx code.)

	movb	a0@(2),_memloc

This instruction could fault after the read of the device register
if either _memloc or the tail end of the instruction isn't resident.
Many devices will clear interrupt bits when you read their status
register or shift in the next byte when you read their data
register, so repeating the entire instruction doesn't give equivalent
results.  One could code this (in C) as

	register  char temp;

	temp = dev->reg;
	memloc = temp;

and hope the compiler doesn't optimize too agressively, or maybe
(someday) use "volatile" to indicate to the compiler that dev->reg
is special, but I wouldn't bet any current compilers take instruction
restart into account when generating code for volatile data fetches.

For the moment this isn't a major problem, since Unix device drivers
usually run in physical memory (so any fault that occurs is fatal,
and won't be rerun), but I long for the day when this isn't necessarily
so.

shebanow@oakhill.UUCP (Mike Shebanow) (08/30/89)

In article <123909@sun.Eng.Sun.COM> rajivp@sun.UUCP (Rajiv Patel) writes:
>  state information. For VLIW machines it seems that instruction continuation
		      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>  might be better for restart might first need to roll back partially completed
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>  operations. 
   ^^^^^^^^^^^
>

This depends on how often you expect roll back to be necessary.  From the
perspective of a single instruction, all the data I have ever seen indicates
that exceptions occur very infrequently.  (on most machines, the most
frequent exception is a page fault).  So, is this really an issue?
In addition, what fraction of time does the lost work account for in
comparison to the amount of time spent by the OS just trying to figure
out what the exception was and how to deal with it?

Mike Shebanow

PS: I do not consider OS calls (traps) or interrupts to be an exception --
the hardware can plan for those and hence not require any roll back.
-------------
Disclaimer: the opinions expressed here are my own, not Motorola's.

GPWRDCS@gp.govt.nz (Don Stokes, GPO) (09/02/89)

In article <205@bbxeng.UUCP>, scott@bbxeng.UUCP (Engineering) writes:
> In article <5990@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes:
>>
>>If the hardware has been designed to do "instruction continuation",
>>then the user program will resume somewhere in the middle of the
>>offending instruction. If the hardware has been designed for
>>"instruction restart", then the program will be resumed at the start
>>of the offending instruction. The user-visible result is the same in
>>both cases. 
>>
> I guess this is where I'm having a problem.  What if the instruction
> involved address increment/decrement modes?  Restarting the instruction
> might not give the exact same result unless the results of auto-inc/dec
> were not placed into the affected registers until the instruction completes.
> 
> I remember reading some literature when the 68010 came out explaining the
> wonderful benefits of instruction continuation and why instruction restart
> did not always solve the problem.  (I don't remember *where* I read this.)
> Now I'm hearing that it doesn't really matter.  Instruction restart makes
> a lot more sense to me as long as the side effects of the instruction are
> not not interruptable.  Is this the case with the 68040?
> 

I think I saw something similar in the depths of my MC68020 User's Guide
(I love the way Motorola call the technical docs for a processor a
"User's Guide").  Might dig it out sometime, but I think the gist of it 
was as follows:

If you execute an instruction:

        MOVE $300, $1000

and location $1000 was not in physical memory, the OS would have to bring 
the page into physical memory.  If the instruction continues after the 
pagefault completes, all is fine.  If the instruction restarts, it has to 
access location $300 again.  However, it is possible that in a tight 
memory situation, the act of paging in $1000 could page $300 out, and 
vice-versa.  While an instruction continuation wouldn't mind, an 
instruction restart would result in an infinite loop paging $1000 in and 
$300 out, restarting, paging $300 in and $1000 out and restarting.

While that case could probably be coded around for a simple two or three
operand case (by ensuring that the last two or three pages accessed by a
process remain in memory, or perhaps something cleverer?), things get
somewhat messier when a block move instruction is executed, such as the
VAX instruction MOVC3, which can move up to 64KB in one go, which could
cross a lot of pages (of course MOVC3 is continuable on the VAX...). 
This wouldn't be a problem on the 680x0, as these processors (unless
they've added block moves to the 68030 and/or 68040) do the grunt work of
block moves in two instructions, eg: 

                MOVE.L  #scraddr, A0
                MOVE.L  #dstaddr, A1
                MOVE.L  #len, D0
        loop:   MOVE.B  (A0)+, (A1)
                DJNZ    D0, loop

(which of course explains why the 68010 (and '012) has a two instruction
cache). 

Don Stokes, Systems Programmer    /  /   Domain:                  don@gp.govt.nz
Government Printing Office,      /GP/   PSImail:          PSI%0530147000028::DON
Wellington, New Zealand         /  /   Bang:    ...!uunet!vuwcomp!windy!gpwd!don
--------------------------------------------------------------------------------
When the going gets tough, upgrade.

bruce@blender.UUCP (Bruce Thompson) (09/02/89)

In article <5990@pt.cs.cmu.edu>, lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes:
	/* discussion about interrupt requests deleted */

> A page fault interrupt isn't like that. The instruction in progress
> cannot go forward: it wants to write to a page that is out on disk
> (or whatever). The interrupt has to be honored at once, and the
> instruction is not completed. The operating system is invoked. The OS
> does good stuff (like disk I/O) and eventually decides to let the
> user program resume.  But resume where in the program? And with what
> register contents, what processor state?
> 
> If the hardware has been designed to do "instruction continuation",
> then the user program will resume somewhere in the middle of the
> offending instruction. If the hardware has been designed for
> "instruction restart", then the program will be resumed at the start
> of the offending instruction. The user-visible result is the same in
> both cases. 
> 

Forgive me if I am re-hashing stuff which has been covered before.

Intruction continuation helps to prevent a condition known as `thrashing'
due to page-faults. An example:

    A machine executes an instruction like the following:
	mov.l	label1, label2

    Assume that the instruction, the source and the destination reside on
different pages in VM. The worst case occurs when both data pages have
been swapped out to disk. The sequence of operations for an `instruction
re-start' machine can occur like this:
	1. fetch the instruction
	2. attempt to fetch the source operand
	3. page fault occurs. Block the process waiting for the page. In
	   most cases, another process is continued until the page has
	   been swapped in.
	4. Re-start the instruction. While waiting for the new page, the
	   page with the instruction has been swapped out. This causes
	   another page fault.
	5. Block the process while the text page is swapped in. Other
	   processes run while waiting.
	6. Re-start the instruction.
	7. goto step 2.

    This sequence can occur for an extremely long period, poossibly
locking up the entire machine, certainly severly degrading performance,
particularly where physical memory is in short supply.

    Where a processor can continue instructions rather than re-start them
the worst-case sequence is:
	1. fetch the instruction
	2. attempt to fetch the source operand
	3. page-fault occurs. Block the process waiting for the new page.
	   Allow other processes to run.
	4. continue instruction
	5. fetch the source operand
	6. attempt to store opreand
	7. page-fault occurs. Block the process until the new page is
	   fetched. Again, other processes are allowed to run.
	8. continue instruction
	9. store the operand

    This demonstrates that on a continuable instruction machine (MC68020
etc.), the paging overhead can be noticably reduced as compared with a
re-start machine.

    From the user's point of view, there will be less paging activity and
better performance on a `continuable' machine.

dswartz@bbn.com (Dan Swartzendruber) (09/03/89)

I think this is somewhat of a misleading point.  Ignoring the block move
case, a generic two-operand instruction on most any machine could take
up to six page faults (the instruction spans a page boundary, and each
of the operands does as well.)  Although it is possible to construct a
theoretical scenario where an instruction restart CPU could get into an
infinite page-fault loop, I will respectfully suggest that if your system
only has 6 free pages of memory, the effective result (as far as getting
any useful work done is concerned) will be pretty much the same!
Not to mention that this scenario assumes almost complete brain-damage on
the parts of: the compiler, the linker, the user and the sys admin...

blarson@basil.usc.edu (bob larson) (09/04/89)

In article <2353@oakhill.UUCP> shebanow@oakhill.UUCP (Mike Shebanow) writes:
>I believe (but I am willing to be proved wrong) that anything that can
>be done using instruction continuation can also be done using restart.

Since instruction continuation requires the ability to save and restore
internal state information, this could be (ab)used in various ways.
The only reasonable one I can think of is for chip testing.  (For both
design and manufacuring defects.)  See the periodicly repeating discussion
of testabilty on sci.electrionics.

-- 
Bob Larson	Arpa:	blarson@basil.usc.edu
Uucp: {uunet,cit-vax}!usc!basil!blarson
Prime mailing list:	info-prime-request%ais1@usc.edu
			usc!ais1!info-prime-request

seanf@sco.COM (Sean Fagan) (09/04/89)

In article <45180@bbn.COM> dswartz@BBN.COM (Dan Swartzendruber) writes:
>Although it is possible to construct a
>theoretical scenario where an instruction restart CPU could get into an
>infinite page-fault loop, I will respectfully suggest that if your system
>only has 6 free pages of memory, the effective result (as far as getting
>any useful work done is concerned) will be pretty much the same!

Somebody else also gave the example:

	movw	$300, $100

(obviously a VAX 8-)).  Dan is, I think, following up to the comment that,
if you're tight on free memory, swapping the page with $100 could cause the
page with $300 to be swapped out (and vice-versa), which would cause real
problems with instruction restart.

Now, my point:  how about shared memory?  (SysV-type shared memory, not
multi-processor-type shared memory.)  With instruction restart, the value in
$300 could have changed, while, with instruction continuation, it doesn't
matter.  How do various OS's and hardwares handle it?

-- 
Sean Eric Fagan  | "Time has little to do with infinity and jelly donuts."
seanf@sco.UUCP   |    -- Thomas Magnum (Tom Selleck), _Magnum, P.I._
(408) 458-1422   | Any opinions expressed are my own, not my employers'.

roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (09/04/89)

In article <265@gp.govt.nz> GPWRDCS@gp.govt.nz (Don Stokes, GPO) writes:
>In article <205@bbxeng.UUCP>, scott@bbxeng.UUCP (Engineering) writes:
>> 
>> I remember reading some literature when the 68010 came out explaining the
>> wonderful benefits of instruction continuation and why instruction restart
>> did not always solve the problem.  (I don't remember *where* I read this.)
>> Now I'm hearing that it doesn't really matter.  Instruction restart makes
>> a lot more sense to me as long as the side effects of the instruction are
>> not not interruptable.  Is this the case with the 68040?

To begin with the last point first, its my understanding that due to
data prefetching and the restart exception model the  same location can 
sometimes be accessed twice (which can be, uhmmm, unpleasant for 
memory-mapped i/o) which is why motorola no longer claims virtual machine 
support for the 68040. My understanding could be flawed though...

But maybe we can second-guess motorola. How about the following scenario:

In order to improve perfomance, motorola decides to pipeline heavily,
heavy pipeline means a lot of data prefetching. Now with all that
prefetched data we suddenly run into an exception...problems. Best thing
is throw it all away and start anew. Now the whole point of instruction
continuation was to know which locations were already accessed. Since this 
could no longer be supported might as well go over to  the simpler 
instruction restart. So the real trade-off was performance vs virtual 
machine support.

Could this be the story behind the instruction continuation discontinuation? 
Stay tuned :-)

As for the first point:

The 68010 programmers reference manual motivates instruction continuation
as follows (1.4.1 Virtual Memory p.8):

"The MC68010 uses instruction continuation rather than instruction restart
to support virtual memory. With instruction restart, the processor must
remember the exact state of the system before each instruction is started
in order to restore that state if a page is fault occurs during its
execution. Then, after the page fault has been repaired, the entire 
instruction that caused the fault is reexecuted. With instruction
continuation, when a page fault occurs the processor stores its internal
state and then after the page fault is repaired, restores that internal
state and continues execution of the instruction. In order for the
mc68010 to utilize instruction continuation, it stores its internal
state on the supervisor stack when a bus cycle is terminated with a
bus error signal......

Instruction continuation has the additional advantage of allowing hardware
support for virtual i/o devices. Since virtual registers may be simulated
in the memory map, an access to such a register will cause a fault and the
function of the register can be emulated by software."

Now, especially in the light of preceding discussion in this group, it is
not clear to me how the first paragraph above motivates instruction
continuation above restart. Apparently the motorola folks seem to agree since
in the 68020 Users Manual we read the following (1.3.1 Virtual memory p.1-7):

"The MC68020 uses instruction continuation to support virtual memory. In
order for the mc68020 to use instruction continuation, it stores its
internal state...

...Instruction continuation is crucial to the support of virtual i/o devices
in memory-mapped input/output systems. Since virtual registers may be
simulated in the memory map, an access to such a register will cause a
fault and the function of the register can be emulated in software."

Note the two differences: first, no more mention of instruction restart
and second, what was first an "additional advantage" has now become
"crucial".

For the 68030 we see again a changed viewpoint:

In the 68030 users manual we read (in 1.6.1 Virtual Memory p.1-11):

"The mc68030 uses instruction continuation to support virtual memory..."

and in 1.6.2 Virtual Machine:

"Instruction continuation is used to support i/o devices in memory-mapped
input/output systems. Control and data registers for the virtual device are
simulated in the memory map. An access to a virtual register causes a fault
and the function of the register is emualated by software."

Note the differences: first, instruction continuation is no longer "crucial"
to support memory mapped i/o devices and second, the motorola folks have
finally figured out that memory-mapping i/o devices is a virtual machine
concept and not a virtual memory concept.

And finally of course (what started this whole thread): the 68040's
non-support of instruction continuation.

>
>I think I saw something similar in the depths of my MC68020 User's Guide
>(I love the way Motorola call the technical docs for a processor a
>"User's Guide").  Might dig it out sometime, but I think the gist of it 
>was as follows:
>
[ Example follows]

You might want to dig it out sometime since I (for one) couldn't find it.

-- 
wiskunde: Dutch for mathematics. Literally: Knowledge of certainty   
wis: certainty			kunde: 	Knowledge
Roelof Vuurboom  SSP/V3   Philips TDS Apeldoorn, The Netherlands   +31 55 432226
domain: roelof@idca.tds.philips.nl             uucp:  ...!mcvax!philapd!roelof

deraadt@enme3.ucalgary.ca (Theo Deraadt) (09/05/89)

In article <3267@scolex.sco.COM> seanf@sco.COM (Sean Fagan) writes:
>Now, my point:  how about shared memory?  (SysV-type shared memory, not
>multi-processor-type shared memory.)  With instruction restart, the value in
>$300 could have changed, while, with instruction continuation, it doesn't
>matter.  How do various OS's and hardwares handle it?
Gad. How about a hardware FIFO? How about a serial receive buffer on
your generic serial chip? I really doubt this rumour, unless some special
trick (like maybe modifying the actual instruction in the code cache
before the restart to skip the allready done part, yes, sorry, sick idea)
was to be done, it just has too many differences from the current 030
and 020/851 setup to be possible.
 <tdr.

Theo de Raadt                    (403) 289-5894     Calgary, Alberta, Canada

johnz@grapevine.uucp (John Zolnowsky ext. 33230) (09/06/89)

In article <241@ssp1.idca.tds.philips.nl>, roelof@idca.tds.PHILIPS.nl (R. Vuurboom) writes:
> But maybe we can second-guess motorola. How about the following scenario:
> 
> In order to improve perfomance, motorola decides to pipeline heavily,
> heavy pipeline means a lot of data prefetching. Now with all that
> prefetched data we suddenly run into an exception...problems. Best thing
> is throw it all away and start anew. Now the whole point of instruction
> continuation was to know which locations were already accessed. Since this 
> could no longer be supported might as well go over to  the simpler 
> instruction restart. So the real trade-off was performance vs virtual 
> machine support.

The 68000 was designed with three stages of prefetch, all controlled by
microcode.  The microcode was free to manage the prefetch, external bus,
and internal data operations in any order.  The actual order was determined
by the microcode, optimizing for performance.  The values of user visible
registers could be invalid, while current values were held in temporary
registers.

Although virtual memory was desired for the 68000, it was deemed too costly
to provide an instruct restart model.  This required many more temporary
registers and data paths to capture user visible values at the instruction
dispatch, and to restore them at a bus error.  The option of restricting
the microcode usage of temporaries and control of the prefetch would have
impaired the performance of the processor.

After the 68000 went to market, the instruction continuation model was
conceived.  This is best understood as an interrupt at the microcode level.
The "stack dump" is a context switch, and the RTE which does the stack
restore is a context switch back.  This model required only a few new
registers, and only one new data path.  This same data path formed the
basis for the 68010 loop mode.

Presumably, in later processors from the family, the provision of extra
hardware to reduce instruction cycle counts also leads to a reduction
in the indeterminancy of the values of registers.  This makes the cost
of instruction restart more tractable.

-John Zolnowsky				...!sun!johnz or johnz@sun.com

philhowr@unix.cie.rpi.edu (Bob Philhower) (09/06/89)

In article <3267@scolex.sco.COM> seanf@sco.COM (Sean Fagan) writes:
>In article <45180@bbn.COM> dswartz@BBN.COM (Dan Swartzendruber) writes:
>
>Now, my point:  how about shared memory?  (SysV-type shared memory, not
>multi-processor-type shared memory.)  With instruction restart, the value in
>$300 could have changed, while, with instruction continuation, it doesn't
>matter.  How do various OS's and hardwares handle it?

I contend that the possibility of the source location being changed
during a page fault on the destination is a non-issue.  If there had
been no page fault, the "old" value would have been written.
Designers who need to worry about this possibility should really be
thinking about some sort of semephore to prevent writing during a
read.

Robert Philhower			            (philhowr@unix.cie.rpi.edu)
Rensselaer Center for Integrated Electronics                     
CII 6111 / Rensselaer Polytechnic Institute / Troy, NY  12180 / USA

dennis@masscomp.UUCP (Dennis Rockwell) (09/06/89)

In article <44908@bbn.COM> dswartz@BBN.COM (Dan Swartzendruber) writes:
>	[ ... ]		The PDP-11 had
>the same problem.  I seem to recall they solved it by having a diagnostic
>register in which the CPU wrote which registers had been incremented or
>decremented and by how much.  That wasn't as bad as it might first seem.
>There are only two registers which can change as a result of any given
>instruction and they could only change by 1, 2 or 4.  It's been a while
>since I hacked on a PDP-11, so I might be off a little here, but that
>was the basic gist....

Some PDP-11s had this register, some did not.  It turns out
that the only time this was a problem was when an
auto-[in|de]crement *floating*point* instruction caused the
fault.  Unfortunately, DEC left this register out of the
PDP-11/60 (or was that the 11/44?), which implemented the
standard floating point instruction set.  Thus, for this
PDP-11 only, you had to do stack probes if you were going to
use *(double *)p++ into automatic storage.

Dennis Rockwell
Concurrent Engineering
Westford MA

firth@sei.cmu.edu (Robert Firth) (09/06/89)

In article <44908@bbn.COM> dswartz@BBN.COM (Dan Swartzendruber) writes:
>	[ ... ]		The PDP-11 had
>the same problem.  I seem to recall they solved it by having a diagnostic
>register in which the CPU wrote which registers had been incremented or
>decremented and by how much.

In article <2812@masscomp.UUCP> dennis@westford.ccur.com (Dennis Rockwell) writes:
>Some PDP-11s had this register, some did not.  It turns out
>that the only time this was a problem was when an
>auto-[in|de]crement *floating*point* instruction caused the
>fault.  Unfortunately, DEC left this register out of the
>PDP-11/60 (or was that the 11/44?), which implemented the
>standard floating point instruction set.  Thus, for this

The handbooks tell me that this register was implemented on all but one
of the memory-managed PDP-11s.  On the old PDP-11/45, it was called
Segment Status Register #1 (SSR1) and had the format:

	Bits 11..15 : amount changed
	Bits  8..10 : register changed
	Bits  3.. 7 : amount changed
	Bits  0.. 2 : register changed

On the later PDP-11s (11/44, 11/70) it was called Memory Management
Register #1 (MMR1).

The basic reason for doing it this way was to allow the fault handling
code to undo the side effects that might have occurred.  At most two
registers could have been changed, and at most by 8.  Note that the
register didn't tell you which register SET was currently in force:
you had to work that out for yourself using the various mode bits
scattered about the place.  You then restored the registers, reset the
PC to point to the start of the instruction (this value was squirrelled
away in MMR2 since you can't decode PDP-11 instructions backwards),
and off you went again.

If I recall aright, the fix on the PDP-11/24 was to keep the stack
double-word aligned.  A floating-point operation could then never
cause a memory-management abort halfway through.

henry@utzoo.uucp (Henry Spencer) (09/07/89)

In article <4008@bd.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
>>... I seem to recall they solved it by having a diagnostic
>>register in which the CPU wrote which registers had been incremented or
>>decremented and by how much.
>
>>Some PDP-11s had this register, some did not...
>
>The handbooks tell me that this register was implemented on all but one
>of the memory-managed PDP-11s...

Unfortunately, not so:  your handbooks probably are not complete.  The
register appeared on the 45, the first memory-managed 11.  It was left
out on the 40, the second.  The 40's MMU was a cut-down version of the
rather kitchen-sink 45 design, since the 40 was a lower-cost machine,
but unfortunately they left out a couple of important things because no
DEC software of the time used them.  (The changed-registers register was
one, split-space was the other.)  The larger memory-managed 11s followed
the 45; the smaller ones followed the 40.  The 40, 34, 60, 23, and 24,
at least, had the brain-damaged MMU.  The 50, 55, and 70 had the 45 MMU,
but that was no big trick since they were all 45s with changes in memory
subsystem details.  The 44 had a *slightly* simplified 45 MMU that got
rid of some of the silliness but left everything important in.  I think
the more recent 11s have mostly followed the 44, but I haven't been
keeping track.
-- 
V7 /bin/mail source: 554 lines.|     Henry Spencer at U of Toronto Zoology
1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (09/12/89)

In article <34228@grapevine.uucp> johnz@grapevine.uucp (John Zolnowsky ext. 33230) writes:
>
>Although virtual memory was desired for the 68000, it was deemed too costly
>to provide an instruct restart model.  This required many more temporary

>
>After the 68000 went to market, the instruction continuation model was
>conceived.  This is best understood as an interrupt at the microcode level.

>
>Presumably, in later processors from the family, the provision of extra
>hardware to reduce instruction cycle counts also leads to a reduction
>in the indeterminancy of the values of registers.  This makes the cost
>of instruction restart more tractable.
>
So what you're saying is that motorola first thought that virtual
memory support could only be supported through instruction restart
later on they conceived the concept of instruction continuation.

The point is still this: why go to instruction restart seeing that

	(1) you _can't_ provide virtual (memory-mapped) i/o with instruction 
	    restart (this is the virtual machine part that can't be supported)

	(2) you can have big problems with memory-mapped i/o based
	    on reads

Instruction restart appears to be _less_ powerful than instruction 
continuation. Just look at the algorithm recently defined here to
prevent instruction restart from doing multiple read accesses. 
-- 
wiskunde: Dutch for mathematics. Literally: Knowledge of certainty   
wis: certainty			kunde: 	Knowledge
Roelof Vuurboom  SSP/V3   Philips TDS Apeldoorn, The Netherlands   +31 55 432226
domain: roelof@idca.tds.philips.nl             uucp:  ...!mcvax!philapd!roelof

roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (09/12/89)

In article <1790@cs-spool.calgary.UUCP> deraadt@enme3.UUCP (Theo Deraadt) writes:
|In article <3267@scolex.sco.COM> seanf@sco.COM (Sean Fagan) writes:
|>Now, my point:  how about shared memory?  (SysV-type shared memory, not
|>multi-processor-type shared memory.)  With instruction restart, the value in
|>$300 could have changed, while, with instruction continuation, it doesn't
|>matter.  How do various OS's and hardwares handle it?
|Gad. How about a hardware FIFO? How about a serial receive buffer on
|your generic serial chip? I really doubt this rumour, unless some special
|trick (like maybe modifying the actual instruction in the code cache
|before the restart to skip the allready done part, yes, sorry, sick idea)
|was to be done, it just has too many differences from the current 030
|and 020/851 setup to be possible.
| <tdr.
>
I'm pretty sure the 68040 will use instruction restart and yes, some read
accesses can occur more than once because of the instruction restart
model. What the solution is for your hardware fifo is something I'm
still trying to figure out :-)

I don't see the shared memory as a real problem since if you had accessed
that location just an instant later the value would have changed anyway.
I doubt that any application would depend on such microsecond timings.
-- 
wiskunde: Dutch for mathematics. Literally: Knowledge of certainty   
wis: certainty			kunde: 	Knowledge
Roelof Vuurboom  SSP/V3   Philips TDS Apeldoorn, The Netherlands   +31 55 432226
domain: roelof@idca.tds.philips.nl             uucp:  ...!mcvax!philapd!roelof

jonah@db.toronto.edu (Jeffrey Lee) (09/12/89)

roelof@idca.tds.PHILIPS.nl (R. Vuurboom) writes:
>In article <1790@cs-spool.calgary.UUCP> deraadt@enme3.UUCP (Theo Deraadt) writes:
>|In article <3267@scolex.sco.COM> seanf@sco.COM (Sean Fagan) writes:
>|Gad. How about a hardware FIFO? How about a serial receive buffer on
>|your generic serial chip? I really doubt this rumour, unless some special
>       What the solution is for your hardware fifo is something I'm
>still trying to figure out :-)

Most of the problems arise from the use of memory-memory instructions.
You SHOULD be OK if you write code that accesses ``critical'' locations
using only a RISC sub-set of the CISC instruction set.  That is if you
use register-register operations and single memory address move
instructions.  That is, convert:

	move fifo,addr1
	add  fifo,addr2

into:

	move fifo,reg
	move reg,addr1
	move fifo,reg1
	move addr2,reg2
	add  reg1,reg2
	move reg2,addr2

You can get away with:

	move fifo,reg
	move reg,addr1
	move fifo,reg
	add  reg,addr2

if addr2 is not a critical variable or the add instruction uses
guaranteed atomic read-modify-write access.

A respectable processor should NOT restart a move instruction with just
one memory operand if the read/write has succeeded.  Therefore an an
exception should not be able to cause the processor to restart the
instruction and re-read the data.  [No bets if the processor has a deep
pipeline--try checking the hardware reference manuals or calling the
manufacturer.]

j.

BEAR@S34.Prime.COM (09/12/89)

It would appear that what this thread is *really* discussing is whether or
not MEMORY MAPPED I/O is a good thing. The concensus seems to be that it
should be avoided (use I/O instructions instead). If it can't be avoided,
be careful.

It may of course be that a particular machine has no special I/O instructions
(e.g. Acme RISC :-)), in which case you should "do the right thing" (most
likely a load or store).

Bob Beckwith
Prime Computer, Inc.
(508)879-2960 x 4209
bear@s34.prime.com

baum@Apple.COM (Allen J. Baum) (09/14/89)

[]
>In article <261500010@S34.Prime.COM> BEAR@S34.Prime.COM writes:
>
>It would appear that what this thread is *really* discussing is whether or
>not MEMORY MAPPED I/O is a good thing. The concensus seems to be that it
>should be avoided (use I/O instructions instead). If it can't be avoided,
>be careful.

OK, I'll bite. What are the characteristics of memory mapped I/O that enable
it to avoid the problems we are talking about? Note that I am assuming that
memory mapped I/O is done with simple Load/Store instructions otherwise, and
not hairy mem-mem translate&test&stand-on-your-head instructions.

--
		  baum@apple.com		(408)974-3385
{decwrl,hplabs}!amdahl!apple!baum

les@unicads.UUCP (Les Milash) (09/14/89)

In article <34701@apple.Apple.COM> baum@apple.UUCP (Allen Baum) writes:
>>In article <261500010@S34.Prime.COM> BEAR@S34.Prime.COM writes:
>>*really* discussing is whether or
>>not MEMORY MAPPED I/O is a good thing. The concensus seems to be that it
>>should be avoided

>OK, I'll bite. What are the characteristics of memory mapped I/O that enable
>it to avoid the problems we are talking about?

i'll summarize (and i'm sure y'all will correct me if i'm wrong:-)
memory mapped i/o has to be "memory-like" since processors will often assume
that stuff in the "memory space" is memory-like, even to the point of calling
its "memory space" virtual and translating to physical.

memory-like devices have the property that if you inquire their value multiple
times all you get is their value (multiple times).  restart is not a problem.
fifos are not memory-like; reading them causes all kinds of side effects
in them including that their value gets forgotten you forgot it (specifically 
they are channel-like (in CSP vernacular))

another lesson recently learned from this newsgroup is that i/o devices that
you can write but not read (like cmd registers to some XXX controller chip)
also are a pain in the ass (but not in the virtual ass; it's the driver 
writer's ass that gets bit).

right?

Les Milash

baum@Apple.COM (Allen J. Baum) (09/15/89)

[]
>In article <261500010@S34.Prime.COM> BEAR@S34.Prime.COM writes:
>
>It would appear that what this thread is *really* discussing is whether or
>not MEMORY MAPPED I/O is a good thing. The concensus seems to be that it
>should be avoided (use I/O instructions instead). If it can't be avoided,
>be careful.

OK, I'll bite. What are the characteristics of memory mapped I/O that enable
it to avoid the problems we are talking about? Note that I am assuming that
memory mapped I/O is done with simple Load/Store instructions otherwise, and
not hairy mem-mem translate&test&stand-on-your-head instructions.

--
		  baum@apple.com		(408)974-3385
{decwrl,hplabs}!amdahl!apple!baum

baum@Apple.COM (Allen J. Baum) (09/15/89)

[]
>In article <261500010@S34.Prime.COM> BEAR@S34.Prime.COM writes:
>We are *really* discussing is whether or not MEMORY MAPPED I/O is a good thing

So, I said "OK, I'll bite. What are the characteristics of memory mapped I/O
that enable it to avoid the problems we are talking about?

Of course, what I really meant, what is it about real I/O, as opposed to memory
mapped I/O that solves the problems?

Oops.
--
		  baum@apple.com		(408)974-3385
{decwrl,hplabs}!amdahl!apple!baum

bruce@tolerant.UUCP (Bruce Hochuli) (09/15/89)

In article <642@unicads.UUCP> les@unicads.UUCP (Les Milash) writes:
:In article <34701@apple.Apple.COM> baum@apple.UUCP (Allen Baum) writes:
:>>In article <261500010@S34.Prime.COM> BEAR@S34.Prime.COM writes:
:>>*really* discussing is whether or
:>>not MEMORY MAPPED I/O is a good thing. The concensus seems to be that it
:>>should be avoided
:
:>OK, I'll bite. What are the characteristics of memory mapped I/O that enable
:>it to avoid the problems we are talking about?
:
:i'll summarize (and i'm sure y'all will correct me if i'm wrong:-)
:memory mapped i/o has to be "memory-like" since processors will often assume
:that stuff in the "memory space" is memory-like, even to the point of calling
:its "memory space" virtual and translating to physical.
:
:memory-like devices have the property that if you inquire their value multiple
:times all you get is their value (multiple times).  restart is not a problem.
:fifos are not memory-like; reading them causes all kinds of side effects
:in them including that their value gets forgotten you forgot it (specifically 
:they are channel-like (in CSP vernacular))
:
Stuff deleted!

Seems to me that this conversation just took a weird turn. The issue here
is not memory mapped vs. I/O mapped, but just what devices are we dealing
with.

One example, if I have an Intel 8254 out there; I have to make my
accesses in a very particular order. I issue a command (access 1)
and I read/write my data (access 2). If I reissue access 1, I will
have a very confused counter/timer out there. Note that this has 
nothing whatever to do with memory or I/O mapping. The same 
example holds true for lots of devices that a designer might
hang on a bus.

Back to the larger issue, I still don't understand how re-issuing
an instruction could avoid having to face this issue.

hascall@atanasoff.cs.iastate.edu (John Hascall) (09/15/89)

In article <642@unicads.UUCP> les@unicads.UUCP (Les Milash) writes:
}In article <34701@apple.Apple.COM> baum@apple.UUCP (Allen Baum) writes:
}>>In article <261500010@S34.Prime.COM> BEAR@S34.Prime.COM writes:
}>>*really* discussing is whether or
}>>not MEMORY MAPPED I/O is a good thing. The concensus seems to be that it
}>>should be avoided
 
}memory-like devices have the property that if you inquire their value multiple
}times all you get is their value (multiple times).  restart is not a problem.
}fifos are not memory-like; reading them causes all kinds of side effects
}in them including that their value gets forgotten you forgot it (specifically 
}they are channel-like (in CSP vernacular))
 

    I guess I fail to see the problem.  I agree that for many I/O devices
    re-reading a device-register is a bad thing.  What I don't see is how
    this can happen except when:

	 a) you have an instruction (or instr. set) which is restarted
               *and*
         b) you have an instruction that reads from two (or more)
	    operands (and the I/O location is not the last one read?).

    Are there machines which can get a page-fault acessing a memory-mapped
    I/O device register location?? (surely not!)

    Examples using the VAX instruction set (write operands are rightmost):

         MOVW   IO_DEV_CSR,R0         ; no problem: no page faults in I/O space
				      ;  (even if MOVW was a restarted instr)
	 SUBW3  IO_DEV_CSR,(R2)+,R3   ; no problem: SUBW3 is continued
         SUBW3  IO_DEV_CSR,@(R2)+,R3  ; trouble: @(R2)+ can cause page fault
	 SUBW3  @(R2)+,IO_DEV_CSR,R3  ; can you get away with this because the
				      ; I/O operand is read last?? 

    I just can't see where you would use such wierd instructions in a
    device driver when accessing memory-mapped I/O registers (even in a
    multiple-memory-accesses-per-instruction machine like the VAX).

John Hascall
Systems Group
ISU Comp Center

melvin@ucbarpa.Berkeley.EDU (Steve Melvin) (09/15/89)

In article <1516@atanasoff.cs.iastate.edu> hascall@atanasoff.cs.iastate.edu.UUCP (John Hascall) writes:
>    I guess I fail to see the problem.  I agree that for many I/O devices
>    re-reading a device-register is a bad thing.  What I don't see is how
>    this can happen except when:
>
>	 a) you have an instruction (or instr. set) which is restarted
>               *and*
>         b) you have an instruction that reads from two (or more)
>	    operands (and the I/O location is not the last one read?).
>
>    Are there machines which can get a page-fault acessing a memory-mapped
>    I/O device register location?? (surely not!)
>
>    Examples using the VAX instruction set (write operands are rightmost):
>
>         MOVW   IO_DEV_CSR,R0         ; no problem: no page faults in I/O space
>				      ;  (even if MOVW was a restarted instr)

The reason this works and seems not to be a problem is that the hardware
designers have gone to some trouble to make it work.  Consider what happens
at the microarchitecure level and I think you'll agree that it really is
a problem.  Let's stick with this instruction and talk about the
VAX 8600 implementation.  When the instruction unit sees the opcode for the
MOVW instruction, the execution unit could still be two instructions
behind.

What happens is the following: the instruction unit decodes the
first operand and generates a virtual address memory read request to the
memory unit.  The memory unit then translates this virtual address (assuming
it's not busy with another request) into a physical address using the
translation buffer.  Assuming a TB hit, then at this point the memory unit
recognizes that it is an I/O address (the I/O space is reconizable from
the physical address, in this case if bit 29 (the MSB) is high).  Since
the execution of all previous instructions has not yet completed at this
point, the memory unit disregards the request and waits for it to be
re-issued when the exeuction unit catches up.  No further pre-fetching
can occur and the pipeline is drained.

The point is that if a previous instruction faults, let's say a page fault
on a destination write, which will not be detected until the very end of the
instruction, the read for the MOVW must not have taken place.  If the I/O
instruction had been recognizable from the opcode (as in my opinion it
should be), the microarchitects could have designed a simpler memory unit
that assumed any prefetch read from a non-I/O instruction is OK.

Also consider that this is a simple example, in a more heavily pipelined
machine, with perhaps even out-of-order prefetching of operands, it gets
even harder to guarantee that these reads don't occur, it basically means
that address translation for all reads must occur in order with a microtrap
mechanism to back out when an I/O address is encountered.  Since the person
writing the device driver or other code that touches I/O registers generally
knows which variables map to I/O space, why not just have them use a
different instruction?  Then, the microarchitecture can much more cleanly
enter and exit this synchronization point.

Steve Melvin
University of California, Berkeley
melvin@arpa.Berkeley.EDU				...!ucbvax!melvin

stevew@wyse.wyse.com (Steve Wilson xttemp dept303) (09/15/89)

In article <642@unicads.UUCP> les@unicads.UUCP (Les Milash) writes:
>i'll summarize (and i'm sure y'all will correct me if i'm wrong:-)
>memory mapped i/o has to be "memory-like" since processors will often assume
>that stuff in the "memory space" is memory-like, even to the point of calling
>its "memory space" virtual and translating to physical.
>

The definition I've always heard/used for memory-mapped I/O was one
which implied that all control to the device was done via memory addresses,
i.e. not using any special I/O instructions such as in/out on the 80x6 line. 
Therefore, any I/O device that is hooked up to a micro such as a 68K would
by definition have to be "memory-mapped" since the 68K doesn't have
provisions for in/out instructions.  The device will react to the decode
of some specific address range presented by the processor.  I've never
heard of said device being "memory-like" as being part of the definition
of this term.  

Steve Wilson

kquick@simpact.com (Kevin Quick, Simpact Assoc., Inc.) (09/16/89)

In article <1516@atanasoff.cs.iastate.edu>, hascall@atanasoff.cs.iastate.edu
(John Hascall) writes:
>     Examples using the VAX instruction set (write operands are rightmost):
>
>          MOVW   IO_DEV_CSR,R0         ; no problem: no page faults in I/O space
> 				      ;  (even if MOVW was a restarted instr)
> 	 SUBW3  IO_DEV_CSR,(R2)+,R3   ; no problem: SUBW3 is continued
>          SUBW3  IO_DEV_CSR,@(R2)+,R3  ; trouble: @(R2)+ can cause page fault
> 	 SUBW3  @(R2)+,IO_DEV_CSR,R3  ; can you get away with this because the
> 				      ; I/O operand is read last??
>
>     I just can't see where you would use such wierd instructions in a
>     device driver when accessing memory-mapped I/O registers (even in a
>     multiple-memory-accesses-per-instruction machine like the VAX).
>
> John Hascall
> Systems Group
> ISU Comp Center

The above instruction examples do show the possible problems involved in
restarting vs continuing instructions when accessing device registers, but
the third and fourth instructions above are usually protected (under VAX/VMS)
in another fashion, namely the Interrupt Priority Level (IPL).

On the VAX machine, setting the IPL to a value between 0 and 31 in the
processor register will block all interrupts occurring at the current IPL
setting or lower until the IPL setting is lowered to the value at which the
interrupt may be delivered.

The importance of this is the VAX/VMS programming convention (read law, since
if you break it, bad things are very likely to happen to you) is that you must
be set to IPL 20 or higher to touch device registers (memory-mapped locations).
In relating this to the third and fourth instructions presented by Mr. Hascall,
it is observed that a VAX/VMS page fault will occur at IPL 2, and is therefore
prevented when accessing device registers at IPL 20.  If Mr. Hascall's example
above did actually generate a page fault, the machine would generate an
invalid exception and a bugcheck ---> crashdump analysis and reboot time.
(VAX/VMS drivers typically operate with system non-paged memory at high IPLs).

As a final note with regards to this discussion, it is the general practice
of device drivers on VAX/VMS to access device registers at IPL 20 or above,
but to perform most of their processing at a lower IPL (typically 8).  Thus,
a driver wishing to read a device register as above would raise the IPL to 20,
read the device register into an unused register, and then return to IPL 8
to continue processing, i.e. the device register is touched ONCE for the
entire process, and it is the driver's responsibility not to lose the value
obtained, since it probably cannot be reread from the device.

My apologies if this seems overly pedantic or is more machine specific than
this discussion/forum warrants, but I wanted to shed some light on one this
aspect of the restarted vs. continued instruction discussion.

	-- Kevin Quick,  Simpact Associates, Inc.,  San Diego, CA.
	   Internet: simpact!kquick@crash.cts.com

mash@mips.COM (John Mashey) (09/16/89)

In article <31316@ucbvax.BERKELEY.EDU> melvin@ucbarpa.Berkeley.EDU.UUCP (Steve Melvin) writes:
....
>>    Examples using the VAX instruction set (write operands are rightmost):
>>
>>         MOVW   IO_DEV_CSR,R0         ; no problem: no page faults in I/O space
>>				      ;  (even if MOVW was a restarted instr)
...
>Also consider that this is a simple example, in a more heavily pipelined
>machine, with perhaps even out-of-order prefetching of operands, it gets
>even harder to guarantee that these reads don't occur, it basically means
>that address translation for all reads must occur in order with a microtrap
>mechanism to back out when an I/O address is encountered.  Since the person
>writing the device driver or other code that touches I/O registers generally
>knows which variables map to I/O space, why not just have them use a
>different instruction?  Then, the microarchitecture can much more cleanly
>enter and exit this synchronization point.

Note that this whole issue is not (just) a hardware issue, it's a:
	hardware instruction-level
	hardware micro-architecture
	language definition
	compiler technology
and	operating system
issue; and it's IMPORTANT to understand how these all fit together.
For example:
1) Some people like to write their device drivers in a language higher
than assembler.  Hence they do not directly choose instructions,
and if the code generator needs to do something different for memory-mapped
I/O, it needs to know that.
2) Even on a simple load/store RISC machine, a global optimizer can
surprise you by rearranging things; the continuation issue, and the
dealing-with-optimizer issue may not look practically different from
the system programmer's view, i.e., they could be surprised either way.
3) Most languages don't even have methods for telling an optimizer
to be careful.  C's volatile is a useful exception.
4) Systems and chips are different.  One may well build a system by
choosing/designing I/O controllers that have "good" properties.  On the
other hand, chips expected to be used in many different ways need to
survive all kinds of odd behaviors. A classic reference here is
by Tom Lyon & Joe Skudlarek of Sun: "All the Chips That Fit",
UNIX Review 4, 2 (Feb 1989), 29-34. (Earlier version in Summer 1985 USENIX).
This is subtitled:
	"Semiconductor manufacturers continue to heap feature upon
	feature, so mama, don't let your babies grow up to be
	system software engineers."
-------
Attributes that make life simpler in machines that use memory-mapped I/O:
1) Load/store architecture, specifically, no more than one load or
	store per instruction, required to be on naturally-aligned
	boundaries, hence fast pipeline with no surprises.
	Include all of the 8-16-32 bit accesses as normal instructions,
	else some devices that must be dealt with can give surprises.
	For example, it is not good enough to do load-words, and then
	extract bytes, as you can cause problems with some device
	registers by issuing extra accesses.
2) If you use global-optimizing compilers, you need (in C) volatile,
	or some equivalent elsewhere.  This has to work "right", where
	"right" turns out to be: after optimization, the exact same
	number of loads and stores to volatile variables must occur,
	in exactly the same order, as before such optimization.  Anything
	less than that leads to crazed systems programmers.
3) Be careful of buffering.  For example, some MIPS-based systems use
	a 4-deep write-buffer that provides read-around, i.e., reads have
	priority over writes, and hence, you can end up doing a write to
	a control register, and possibly then reading the associated
	status register while the write is still pending.  (We use a
	kernel function wbflush() that waits until the write buffer is
	empty.  This is OK and works; however some of the newer systems
	use write-flushing, i.e., a read stalls until all of the writes
	are done, and this is clearly easier to use, although there is
	little difference in performance (stalls are stalls, no matter
	what).  In particular, it almost seems like uncached references
	in hardware are like volatile in software: a good default is to
	stall and make sure the state is clean.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

hascall@atanasoff.cs.iastate.edu (John Hascall) (09/16/89)

In article <?> kquick@simpact.com (Kevin Quick, Simpact Assoc., Inc.) writes:
}In article <?>, hascall@atanasoff.cs.iastate.edu (John Hascall) writes:
}>     Examples using the VAX instruction set (write operands are rightmost):
 
}>       MOVW   IO_DEV_CSR,R0         ; no problem: no page faults in I/O space
}> 				      ;  (even if MOVW was a restarted instr)
}> 	 SUBW3  IO_DEV_CSR,(R2)+,R3   ; no problem: SUBW3 is continued
}>       SUBW3  IO_DEV_CSR,@(R2)+,R3  ; trouble: @(R2)+ can cause page fault
}> 	 SUBW3  @(R2)+,IO_DEV_CSR,R3  ; can you get away with this because the
}> 				      ; I/O operand is read last??

}The above instruction examples do show the possible problems involved in
}restarting vs continuing instructions when accessing device registers, but
}the third and fourth instructions above are usually protected (under VAX/VMS)
}in another fashion, namely the Interrupt Priority Level (IPL).

  Mostly, raising IPL protects you from critical section problems (talking
  uni-processor here).  Section 6.2 of "Writing a Device Driver for VAX/VMS"
  states (item 7):

       To access I/O space, use only the following instructions.  These
       instructions cannot be interrupted unless they use autoincrement-
       defferred addressing mode or any of the displacement-deferred modes
       when specifying an operand.
 
}In relating this to the third and fourth instructions presented by Mr. Hascall,
}it is observed that a VAX/VMS page fault will occur at IPL 2, and is therefore
}prevented when accessing device registers at IPL 20.

       Page faults are not allowed above IPL 2, not for really architectural
 reasons, but because of the critical section problem and since VMS does
 process scheduling at IPL 3 (there may be not be a process context in
 which to make the page valid).
 
       Anyway, the use of VAX instructions was purely for my own convenience.
 My point was, (as was restated by Mr. Quick), regardless of instructions
 available most/all device drivers only need access (memory mapped) device
 registers in a simple fashion.  Thus, I don't think memory mapped I/O is
 somehow less flexible than having specific I/O instructions as was suggested
 earlier even if certain instructions and/or addressing modes have to be
 prohibited.

John Hascall

melvin@ucbarpa.Berkeley.EDU (Steve Melvin) (09/18/89)

In article <27633@winchester.mips.COM> mash@mips.COM (John Mashey) writes:
>Note that this whole issue is not (just) a hardware issue, it's a:
>	hardware instruction-level
>	hardware micro-architecture
>	language definition
>	compiler technology
>and	operating system
>issue; and it's IMPORTANT to understand how these all fit together.
>...
>Attributes that make life simpler in machines that use memory-mapped I/O:
>1) Load/store architecture,
>...

Your point is well taken, there are many sides to this issue, but I don't
think it's fair to say that load/store *architectures* make life simpler for
systems programmers; using simple loads and stores is pretty much of a
requirement as has been pointed out, regardless of whether memory to memory
instructions exist.  But which instructions are used and what restrictions
need to be placed on them is secondary to the real issue here.  The bottom
line for an I/O instruction is that it represents a synchronization point from
the perspective of the hardware.  That is, all unconfirmed operations have to
be verified before the I/O operation can take place.  All predicted branches
have to be confirmed, all pending memory reads and writes have to at least be
translated to verify that they can be completed and all operations that can
generate exceptions have to be executed.  Generally this means that the entire
pipeline has to be drained.  This is a simple fact, there is no way around it
(at least not as long as reads have side-effects and there is no "undo"
function.)

However, if the mere fact that you have to handle these synchronization points
correctly (which are few and far between) slows down the other 99.9% of your
code, something is wrong.  In low concurrency machines, memory mapped I/O
isn't a big deal in this regard because it doesn't slow down non-I/O
code.  Just go ahead and use ordinary instructions with ordinary virtual
addresses (with appropriate restrictions on number and type of operands,
as has been discussed) and let the hardware figure out that it has an I/O
instruction when it sees the address.

However, in high concurrency microarchitectures which execute multiple
operations per cycle and in an order determined at run-time (these processors
are coming, BTW) there has to be a more explicit way to let the hardware know
about an I/O instruction.  Simply using ordinary instructions with ordinary
addresses, and expecting the hardware to do the right thing won't work.
You can't expect to get maximal speedup on the code that doesn't know or care
about I/O if the hardware has to guarantee that it doesn't trip across a
synchronization point in the middle of some basic block that it is executing
out-of-order.

There are many possible ways to do this and memory mapped I/O could still
be incorporated into such a solution.  My only point is that presenting a
memory model to the hardware in which reads don't have side-effects is of
critical importance in high concurrency designs.  (Of lesser importance but
also of value is the property of multiple writes (i.e. a write of an
incorrect value can take place and the correct value can be later written.))
I think that the increase in performance will win out whatever reduction in
convenience this implies and people will figure out whatever has to be
figured out at the higher levels in order to allow the hardware to make
these assumptions unless explicitly told otherwise (i.e. BEFORE address
translation: part of the opcode, surrounded by special instructions, etc.).

----
Steve Melvin
University of California, Berkeley
----

ok@cs.mu.oz.au (Richard O'Keefe) (09/18/89)

There is another problem with memory-mapped I/O instructions; I guess this
site must be losing some messages because I don't recall seeing it mentioned.
If I do
	movb	InputPort, r0
where InputPort is a memory-mapped device port, and then a few instructions
later to
	movb	InputPort, r0
again, I really don't want the second reference to look in the cache and
return whatever the first reference returned.  Presumably the VAX handles
this by ensuring that addresses in the "device" range are never cached.
A simpler approach could be to have I/O instructions.  Note that "memory
mapped I/O" has two faces:
    -- device registers appear as memory locations TO THE CPU
    -- device registers appear as memory locations TO THE BUS
It would be possible to have a machine with special
	input	DeviceAddress, Register
	output	DeviceAddress, Register
instructions which the CPU, cache, and so on "knew" about, but which
appeared ON THE BUS just like memory references that miss the cache.
Whether that would be a good thing is another question again.

johnl@esegue.segue.boston.ma.us (John R. Levine) (09/20/89)

In article <2128@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes:
>...  Note that "memory mapped I/O" has two faces:
>    -- device registers appear as memory locations TO THE CPU
>    -- device registers appear as memory locations TO THE BUS
>It would be possible to have a machine with special
>	input	DeviceAddress, Register
>	output	DeviceAddress, Register
>instructions which the CPU, cache, and so on "knew" about, but which
>appeared ON THE BUS just like memory references that miss the cache.
>Whether that would be a good thing is another question again.

It's a fine idea.  On the 8086 and its descendants, hence on the IBM PC,
excuse me, Industry Standard Architecture, bus, I/O cycles and memory cycles
are the same except for a line that says whether it's an I/O or a memory
address.  I/O addresses are never cached, of course.

The PDP-11 was the first machine to use memory-mapped I/O (the first one I
know about, anyway.)  By convention, all device registers were mapped in the
highest 8K bytes of the address space, and caches knew not to cache
addresses in that range.  On the Q-Bus, the second version of the PDP-11
bus, there is even a line that says that the current address is in the top
8K.  They intended it to make it easier to decode device addresses, but it
is equally useful to distinguish between I/O and memory.
-- 
John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 492 3869
johnl@esegue.segue.boston.ma.us, {ima|lotus}!esegue!johnl, Levine@YALE.edu
Massachusetts has 64 licensed drivers who are over 100 years old.  -The Globe

peter@ficc.uu.net (Peter da Silva) (09/25/89)

In article <5876@tolerant.UUCP>, bruce@tolerant.UUCP (Bruce Hochuli) writes:
> Back to the larger issue, I still don't understand how re-issuing
> an instruction could avoid having to face [problems with memory mapped
> I/O].

What is the characteristic that makes "I/O mapped" I/O (that is, I/O that
uses special instructions to access the I/O address space) safe? The
characteristic is that the instruction can not fault in such a way that
an I/O operation is performed twice. That is, "I/O mapped" instructions
are inherently load/store.

What is the characteristic that makes memory mapped I/O dangerous? That
if a memory-memory instruction with an I/O device at one end faults, the
I/O operation can be duplicated. The simple solution is... don't perform
memory-memory instructions on I/O. Fairly easy in CISC, and simple in most
RISC architectures... they don't have memory-memory operations.
-- 
Peter da Silva, *NIX support guy @ Ferranti International Controls Corporation.
Biz: peter@ficc.uu.net, +1 713 274 5180. Fun: peter@sugar.hackercorp.com. `-_-'
"That is not the Usenet tradition, but it's a solidly-entrenched            U
 delusion now." -- brian@ucsd.Edu (Brian Kantor)

peter@ficc.uu.net (Peter da Silva) (09/25/89)

Taking the other side now...

In article <1516@atanasoff.cs.iastate.edu>, hascall@atanasoff.cs.iastate.edu (John Hascall) writes:
>          SUBW3  IO_DEV_CSR,@(R2)+,R3  ; trouble: @(R2)+ can cause page fault

>     I just can't see where you would use such wierd instructions in a
>     device driver when accessing memory-mapped I/O registers (even in a
>     multiple-memory-accesses-per-instruction machine like the VAX).

If the device-driver is written in a high-level language, perhaps?
-- 
Peter da Silva, *NIX support guy @ Ferranti International Controls Corporation.
Biz: peter@ficc.uu.net, +1 713 274 5180. Fun: peter@sugar.hackercorp.com. `-_-'
"That is not the Usenet tradition, but it's a solidly-entrenched            U
 delusion now." -- brian@ucsd.Edu (Brian Kantor)

stevew@wyse.wyse.com (Steve Wilson xttemp dept303) (09/28/89)

In article <6283@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
>In article <5876@tolerant.UUCP>, bruce@tolerant.UUCP (Bruce Hochuli) writes:
>> Back to the larger issue, I still don't understand how re-issuing
>> an instruction could avoid having to face [problems with memory mapped
>> I/O].
>
>What is the characteristic that makes "I/O mapped" I/O (that is, I/O that
>uses special instructions to access the I/O address space) safe? The
>characteristic is that the instruction can not fault in such a way that
>an I/O operation is performed twice. That is, "I/O mapped" instructions
>are inherently load/store.
>
>What is the characteristic that makes memory mapped I/O dangerous? That
>if a memory-memory instruction with an I/O device at one end faults, the
>I/O operation can be duplicated. The simple solution is... don't perform
>memory-memory instructions on I/O. Fairly easy in CISC, and simple in most
>RISC architectures... they don't have memory-memory operations.
>-- 
Peter,

Can't agree with your statement about it being fairly easy to avoid doing
memory operations on memory-mapped I/O using  CISCS.  There are several
CISCs(the 68k and 32K come immediately to mind) that don't have I/O 
instructions of any sort, therefore you have to map the I/O registers
into the machine's memory space.  Assume you've got a FIFO type device
such as a USART.  Any read of the data register will be destructive, i.e.
the memory location value will change as a function of the arriving characters
in the USART's FIFO.  The only way that a 68k or 32k can talk to this 
device is via the micro's memory space.  If the micro is designed to 
"instruction restart" you have to guarantee that the select NEVER got
out to the USART.  This tends to get in the way of building zero 
wait-state hardware ;-)   I think this is the type of problem Bruce
is talking about(Hi Bruce).

Steve Wilson
Consultant at large
Currently serving time at Wyse Technology

Standard Disclaimer - These are my opinions, not those of my employer's.

ingoldsb@ctycal.COM (Terry Ingoldsby) (09/29/89)

In article <2451@wyse.wyse.com>, stevew@wyse.wyse.com (Steve Wilson xttemp dept303) writes:
> In article <6283@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
> >In article <5876@tolerant.UUCP>, bruce@tolerant.UUCP (Bruce Hochuli) writes:
> >> Back to the larger issue, I still don't understand how re-issuing
> >> an instruction could avoid having to face [problems with memory mapped
> >> I/O].
> >
> >What is the characteristic that makes "I/O mapped" I/O (that is, I/O that
> >uses special instructions to access the I/O address space) safe? The
> >characteristic is that the instruction can not fault in such a way that
> >an I/O operation is performed twice. That is, "I/O mapped" instructions
> >are inherently load/store.
> >
> >What is the characteristic that makes memory mapped I/O dangerous? That
> >if a memory-memory instruction with an I/O device at one end faults, the
> 
> Can't agree with your statement about it being fairly easy to avoid doing
> memory operations on memory-mapped I/O using  CISCS.  There are several

I think what is being suggested is a method of preserving the philosophy
of I/O mapped instructions using a processor that only has memory mapped
I/O.  Generally, what you are trying to avoid is a page fault (or some
other sort of interrupt/trap/bus error that would happen part way through
an instruction, thus causing it to wait for the fault to clear, and then
re-execute the entire instruction.  Suppose, for example, you wanted to
read a data byte from the data register (which is mapped to address addr1)
of a serial device.  If you performed a CISCy, memory based instruction like:
    MOVM      addr1,addr2    (ie Move data found at location addr1 to addr2)
then if addr1 and addr2 are not in the same page, a page fault may occur.
Since the processor might not know this until it had accessed addr1 (and
was trying to access addr2) then execution would pause until the fault
had been corrected (ie. the page brought into memory) and then the whole
instruction would repeat.  The next value would be read from the peripheral
data register (losing the previous value).

As an alternative, do the following:
    MOV       addr1,R1      (ie Move data found at addr1 into register R1)
    STO       R1,addr2      (store register contents in addr2)
Even if the fault occurs during the read of addr1 (a bit odd since
peripheral memory locations are usually non-pageable) then the access will
still only occur once.  If the page fault occurs during the store then
we similarly don't care since addr2 will also only be accessed once (in my
example its just memory but could conceivably be another peripheral).

Sorry to ramble on (I usually prefer to listen), but I thought there was
some abiguity in the discussion.

-- 
  Terry Ingoldsby                       ctycal!ingoldsb@calgary.UUCP
  Land Information Systems                           or
  The City of Calgary         ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb

ok@cs.mu.oz.au (Richard O'Keefe) (09/30/89)

In article <477@ctycal.UUCP>, ingoldsb@ctycal.COM (Terry Ingoldsby) writes:
> As an alternative, do the following:
>     MOV       addr1,R1      (ie Move data found at addr1 into register R1)
>     STO       R1,addr2      (store register contents in addr2)
> Even if the fault occurs during the read of addr1 (a bit odd since
> peripheral memory locations are usually non-pageable) then the access will
> still only occur once.  If the page fault occurs during the store then
> we similarly don't care since addr2 will also only be accessed once (in my
> example it's just memory but could conceivably be another peripheral).

It doesn't sound all that unreasonable for a page fault to occur during
a memory-mapped I/O operation.  Imagine a memory-mapped scheme where each
device has all its registers in a different page of I/O space, and where
the operating system is running a "virtual machine" scheme.  All I/O
pages would initially be mapped out of a process's address space.  Touching
an I/O page would cause a page fault, at which time the O/S would check
whether the process had permission to access that device, and if so would
map the page in.  If the O/S needed to seize control of the device back,
it would map the page out again.

Whether such a scheme is useful or not is another matter.
-- 
GNUs are more derived than other extant alcelaphines,| Richard A. O'Keefe
such as bonteboks, and show up later in the fossil   | visiting Melbourne
record than less highly derived species.  (Eldredge) | ok@munmurra.cs.mu.OZ.au

ingoldsb@ctycal.COM (Terry Ingoldsby) (10/02/89)

In article <2255@munnari.oz.au>, ok@cs.mu.oz.au (Richard O'Keefe) writes:
> In article <477@ctycal.UUCP>, ingoldsb@ctycal.COM (Terry Ingoldsby) writes:
> > As an alternative, do the following:
> >     MOV       addr1,R1      (ie Move data found at addr1 into register R1)
> >     STO       R1,addr2      (store register contents in addr2)
> > Even if the fault occurs during the read of addr1 (a bit odd since
> > peripheral memory locations are usually non-pageable) then the access will
> 
> It doesn't sound all that unreasonable for a page fault to occur during
> a memory-mapped I/O operation.  Imagine a memory-mapped scheme where each
> device has all its registers in a different page of I/O space, and where
> the operating system is running a "virtual machine" scheme.  All I/O
> pages would initially be mapped out of a process's address space.  Touching
> an I/O page would cause a page fault, at which time the O/S would check
> whether the process had permission to access that device, and if so would
> map the page in.  If the O/S needed to seize control of the device back,
> it would map the page out again.
> 
Granted, it could happen under some conditions.  I was thinking more of
the case where the user doesn't actually do the memory access, but rather
makes a system call for the OS to do it.  Many OSes lock things like disk
device drivers, buffer space, tables, etc. into memory and I assumed that
many OSes would do the same with the memory locations corresponding to
devices.  This wouldn't be hard to do since a small region of memory could
be dedicated to this.  Whether this is done or not I have no idea (any
OS gurus out there know how it really is done?).  Nonetheless, it doesn't
make any difference since the instructions I suggested still work even if
the fault occurs.

The only thing that I can see happening if the page is not locked into
memory is if a swapper tries to write a region of memory out to disk.  That
could provoke some unusual behaviour of the peripherals!  A similar argument
would exist for swapping (or paging - if it is a hard page fault) the region
back into memory.  Soft page faults (ie. those where the page is in physical
memory, but not in a process's page table) just involve setting up some
pointers and don't actually affect (ie. read/write) the memory locations.
Is any of this making sense?
-- 
  Terry Ingoldsby                       ctycal!ingoldsb@calgary.UUCP
  Land Information Systems                           or
  The City of Calgary         ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb

roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (10/02/89)

In article <477@ctycal.UUCP> ingoldsb@ctycal.COM (Terry Ingoldsby) writes:
>
>As an alternative, do the following:
>    MOV       addr1,R1      (ie Move data found at addr1 into register R1)
>    STO       R1,addr2      (store register contents in addr2)
>Even if the fault occurs during the read of addr1 (a bit odd since
>peripheral memory locations are usually non-pageable) then the access will
>still only occur once.  If the page fault occurs during the store then
>we similarly don't care since addr2 will also only be accessed once (in my
>example its just memory but could conceivably be another peripheral).
>
That the peripheral memory location could page fault is only part of
the problem (and as you rightly point out perhaps a not very realistic
one at that).

Because (out of order) pre-fetching can be done you can have:

 	MOV	any_old_addr1,any_old_addr2
	MOV	peripheral_addr,R1

Because (out of order) pre-fetching can be done you can have peripheral_
addr1 being accessed before one of the any_old_addresses.
Now if either of the any_old_addresses are not accessible and (out of order)
prefetching is allowed you can have a problem even if peripheral_addr _is_
accessible.

And how about dual pre-fetching for both taken and not taken path streams?
(Again a 68040 special).
If the memory-mapped io instruction is the first instruction in each of
the branches then in my book you've got a problem.

I'ld like to bring back in memory a quote from an execellent posting from 
Steve Melvin from the University of California Berkeley 2 weeks ago on this 
subject:

"...there are many sides to this issue, but I don't
think it's fair to say that load/store *architectures* make life simpler for
systems programmers; using simple loads and stores is pretty much of a
requirement as has been pointed out, regardless of whether memory to memory
instructions exist.  But which instructions are used and what restrictions
need to be placed on them is secondary to the real issue here.  The bottom
line for an I/O instruction is that it represents a synchronization point from
the perspective of the hardware.  That is, all unconfirmed operations have to
be verified before the I/O operation can take place.  All predicted branches
have to be confirmed, all pending memory reads and writes have to at least be
translated to verify that they can be completed and all operations that can
generate exceptions have to be executed.  Generally this means that the entire
pipeline has to be drained.  This is a simple fact, there is no way around it
(at least not as long as reads have side-effects and there is no "undo"
function.)"

-- 
"Geld groeit me niet op de rug." Literally: "Money doesn't grow on my back."
(Often overheard at the supermarket counter from mothers to their kids.)
Roelof Vuurboom  SSP/V3   Philips TDS Apeldoorn, The Netherlands   +31 55 432226
domain: roelof@idca.tds.philips.nl             uucp:  ...!mcvax!philapd!roelof

baum@Apple.COM (Allen J. Baum) (10/03/89)

[]
>In article <2255@munnari.oz.au> ok@cs.mu.oz.au (Richard O'Keefe) writes:
>  Imagine a memory-mapped scheme where each
>device has all its registers in a different page of I/O space, and where
>the operating system is running a "virtual machine" scheme.  All I/O
>pages would initially be mapped out of a process's address space.  Touching
>an I/O page would cause a page fault, at which time the O/S would check
>whether the process had permission to access that device, and if so would
>map the page in.  If the O/S needed to seize control of the device back,
>it would map the page out again.
>
>Whether such a scheme is useful or not is another matter.
 This is very similar to the HP "Spectrum" ..er.. Precision's IO scheme.
All I/O devices are mapped onto two pages. Data registers are mapped to one,
and control registers to the other (generally). The idea is that you can
keep control of a device, and still let programs have access to the data.
Really, there is no reason not to let a user have direct access to his/her
own serial port; it can't affect security. You may not want to give them
access to the control registers, especially if they affect more than one
line. Direct access for the common stuff means a LOT lower overhead. You can
get a keystroke with a "Load" instruction in one cycle, instead of a system
call that is likely to cost you a millisecond.

--
		  baum@apple.com		(408)974-3385
{decwrl,hplabs}!amdahl!apple!baum

andrew@frip.WV.TEK.COM (Andrew Klossner) (10/03/89)

[]

	"The bottom line for an I/O instruction is that it represents a
	synchronization point from the perspective of the hardware.
	That is, all unconfirmed operations have to be verified before
	the I/O operation can take place."

This is a sufficient but not a necessary condition.  It's more
restrictive than it needs to be.

By way of counter-example:  the M88k has a lot of pipelining, and some
of the floating-point exceptions are imprecise.  I might very well have
code like this:

	fmul.sss	r2,r3,r4	; start a floating multiply
	ld		r5,r6,0		; start an I/O read

and, while the load is underway, the FPU decides to fire off a floating
underflow exception.  That's fine, I expected this and my floating
exception handler substitutes zero, cleans out the pipes, and returns.
I don't need to idle for four cycles waiting for the multiply to finish
just on the rare chance that it will underflow.

If for some reason I want synchronization, I'll use the provided
instructions to suspend execution until all the pipelines have drained.
But I don't need to do this on every I/O op.

  -=- Andrew Klossner   (uunet!tektronix!frip.WV.TEK!andrew)    [UUCP]
                        (andrew%frip.wv.tek.com@relay.cs.net)   [ARPA]

vorbrueg@bufo.usc.edu (Jan Vorbrueggen) (10/03/89)

In article <477@ctycal.UUCP> ingoldsb@ctycal.COM (Terry Ingoldsby) writes:
>
>As an alternative, do the following:
>    MOV       addr1,R1      (ie Move data found at addr1 into register R1)
>    STO       R1,addr2      (store register contents in addr2)
>Even if the fault occurs during the read of addr1 (a bit odd since
>peripheral memory locations are usually non-pageable) then the access will
>still only occur once.  If the page fault occurs during the store then
>we similarly don't care since addr2 will also only be accessed once (in my
>example its just memory but could conceivably be another peripheral).

But what if the pagetables mapping addr1 or addr2 are pageable?
This is the case on VAXen, where a process' pagetables (for its
private virtual address space called P0 or P1) reside in the
virtual address space shared by all processes (called S0). The
pagetables for S0, of course, are stored in contiguous physical
memory, at a physical address recorded in a processor register.

So, a pagefault can occur for two reasons: the page referenced is
invalid (not in the working set), or the pagetable page is invalid.
Naturally, the pagetable page for a valid (in-working set) page
has to be valid. 

However, VMS since (at least) V2.3 and probably until today has a bug where
pages mapped to physical memory (e.g., I/O registers), though by
definition always valid, don't increment the reference count of
the pagetable page. In a memory-tight situation (in this case, the
process was only allowed 100 pages in its working set), VMS will
kindly recuperate unused memory by performing a so-called dead 
pagetable scan. It proceeds to remove the pagetable mapping the device,
and next time the process references it, bang: down goes the system!
(Pagefaults are not allowed in certain situations.)

Another nice situation happens when you debug an application using
a memory-mapped device. Say your application sadly has a bug and accesses
an address where there is no device. It will get an access violation
fault. So you fire up the debugger and single-step through it.
This is implemented by using special trace bits which will return
control to the debugger after every instruction. The program
writes to that non-existant location and gets its trace-pending
trap, calling a priviledged-mode routine to handle the trap. Some
instructions (or 10 microseconds) later, the bus tells the cpu that
it couldn't perform the write. Now the access violation seems to have
occured in the priviledged mode - and a crash occurs...

Jan Vorbrueggen

lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (10/03/89)

In article <265@ssp1.idca.tds.philips.nl> roelof@idca.tds.PHILIPS.nl (R. Vuurboom) writes:
>I'ld like to bring back in memory a quote from an execellent posting from 
>Steve Melvin from the University of California Berkeley 2 weeks ago on this 
>subject:
>The bottom
>line for an I/O instruction is that it represents a synchronization point from
>the perspective of the hardware.  That is, all unconfirmed operations have to
>be verified before the I/O operation can take place.  All predicted branches
>have to be confirmed, all pending memory reads and writes have to at least be
>translated to verify that they can be completed and all operations that can
>generate exceptions have to be executed.  Generally this means that the entire
>pipeline has to be drained.  

I'm afraid I don't agree.

Since direct control over a physical IO device implies some level of
privilege, it is reasonable to require each handler to memory-lock
its pages beforehand. It is also reasonable to insist that the
handler not divide by zero (etc) within the immediate vicinity of the
IO action. Handlers don't need a "drain pipeline" instruction, if
they are not going to fault.

As proof, I offer the working systems that are out in the world.

You seem to be asking for processors that are more complicated, and
complication is not free. You cannot justify your case by showing how
it will make handlers possible: they already are.  So, you will have
to show how it makes handlers better, or cheaper. 

Old truism, disguised as an old joke:
	"Doctor, it hurts when I do THIS."
	"Then don't do that."
-- 
Don		D.C.Lindsay 	Carnegie Mellon Computer Science

melvin@ucbarpa.Berkeley.EDU (Steve Melvin) (10/03/89)

In article <4796@orca.WV.TEK.COM> andrew@frip.wv.tek.com writes:
>By way of counter-example:  the M88k has a lot of pipelining, and some
>of the floating-point exceptions are imprecise.  I might very well have
>code like this:
>
>	fmul.sss	r2,r3,r4	; start a floating multiply
>	ld		r5,r6,0		; start an I/O read
>
I'm a little surprised that the 88K can actually do the I/O read before it
knows if the multiply will generate a fault.  Can you absolutely confirm this?
Either way though, you have brought up a valid point, and that is that some
memory mapped I/O registers may not need to be synchronized on.  Certainly, a
reasonable alternative to having the hardware assume all I/O reads have
side-effects would be to let the programmer specify explicitly when they want
synchronization and not provide it otherwise. If they expect only one read of
the I/O register, the machine will still have to synchronize, but perhaps this
is not always required.  But this was my point.  My original posting was
suggesting that it is better to have the programmer explicitly let the
hardware know when a read has a side-effect rather than to have the hardware
discover this fact after address translation.  The thing to keep in mind,
however, is that in some situations it may be difficult for the programmer to
know if it's "safe" not to synchronize unless the read truly has no
side-effects.  The processor may not yet have confirmed a branch that has been
predicted many instructions back.

In article <6384@pt.cs.cmu.edu> lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) writes:
>Since direct control over a physical IO device implies some level of
>privilege, it is reasonable to require each handler to memory-lock
>its pages beforehand. It is also reasonable to insist that the
>handler not divide by zero (etc) within the immediate vicinity of the
>IO action. Handlers don't need a "drain pipeline" instruction, if
>they are not going to fault.
>
This is certainly a valid approach.  That is, why not put the responsibility
on the programmer to guarantee that no memory or arithmetic exceptions will
occur in the vicinity of the I/O instruction.  Then, the processor would not
have to confirm outstanding memory and arithmetic operations.  But faults are
not the only issue.  Even in the situation you propose, the hardware would
still have to confirm outstanding branch predictions and it is also the case
that there would have to be some sequencing control, implicit or explicit, in
order to force multiple I/O reads to occur in program order.  The question of
exactly what is the *vicinity* gets a bit tricky also.  In some situations,
the vicinity might be quite large, and within the *dynamic* instruction
stream, which may be difficult to know exhaustively.

>As proof, I offer the working systems that are out in the world.
>
OK, I don't claim to know much about the real world.  But I would be
interested to know what systems out there work as you suggest.  That is,
processors that do not guarantee that a memory mapped I/O read will not take
place multiple times if it is in the vicinity of an instruction that could
fault.  This is certainly not the case for any VAX.

-------
Steve Melvin
University of California, Berkeley
-------

marc@oahu.cs.ucla.edu (Marc Tremblay) (10/03/89)

In article <4796@orca.WV.TEK.COM> andrew@frip.wv.tek.com writes:
>By way of counter-example:  the M88k has a lot of pipelining, and some
>of the floating-point exceptions are imprecise.  I might very well have
>code like this:
>
>	fmul.sss	r2,r3,r4	; start a floating multiply
>	ld		r5,r6,0		; start an I/O read
>
>and, while the load is underway, the FPU decides to fire off a floating
>underflow exception.  That's fine, I expected this and my floating
>exception handler substitutes zero, cleans out the pipes, and returns.
>I don't need to idle for four cycles waiting for the multiply to finish
>just on the rare chance that it will underflow.

Notice that these two instructions do not have any control or data
dependencies. The load could have been executed before the floating 
point multiply and the result would have been the same.
So in this example it really doesn't matter (for the exception
routine) if ld is executed before fmul.sss!

Problems would occur if instructions occurring after the "faulty" instruction
modify registers needed in the exception routine.
Fortunately the 88000 provides special registers which contain the
necessary information for the software to complete an instruction
that caused an imprecise exception.

					Marc Tremblay
					marc@CS.UCLA.EDU

ingoldsb@ctycal.UUCP (Terry Ingoldsby) (10/05/89)

In article <20260@usc.edu>, vorbrueg@bufo.usc.edu (Jan Vorbrueggen) writes:
> In article <477@ctycal.UUCP> ingoldsb@ctycal.COM (Terry Ingoldsby) writes:
> >
> >As an alternative, do the following:
> >    MOV       addr1,R1      (ie Move data found at addr1 into register R1)
> >    STO       R1,addr2      (store register contents in addr2)
> But what if the pagetables mapping addr1 or addr2 are pageable?


Sorry, I'm not quite following how this anything.  ie. why will that
make addr1 or addr2 get accessed more than once?
-- 
  Terry Ingoldsby                       ctycal!ingoldsb@calgary.UUCP
  Land Information Systems                           or
  The City of Calgary         ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb

lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (10/10/89)

In article <487@ctycal.UUCP> ingoldsb@ctycal.UUCP (Terry Ingoldsby) writes:
>In article <20260@usc.edu>, vorbrueg@bufo.usc.edu (Jan Vorbrueggen) writes:
>> In article <477@ctycal.UUCP> ingoldsb@ctycal.COM (Terry Ingoldsby) writes:
>> >
>> >As an alternative, do the following:
>> >    MOV       addr1,R1      (ie Move data found at addr1 into register R1)
>> >    STO       R1,addr2      (store register contents in addr2)
>> But what if the pagetables mapping addr1 or addr2 are pageable?
>
>Sorry, I'm not quite following how this anything.  ie. why will that
>make addr1 or addr2 get accessed more than once?

The worst that can happen is that processing the STO will cause a
fault, after the MOV has already started its memory read, and before
the MOV has completed.

There are four ways out of that situation:

1) Abort the MOV and restart it later. This is the dreaded Bad Thing.

2) The hardware can finish the outstanding instruction before honoring
   the fault.

3) The data read from memory can be latched, and the fault handler
   can rationalize things.

4) My personal favorite: "Then don't do that." Either faults are OK,
   or they aren't. If they aren't, then the code segment should be
   memory locked. Locking the page table entries should be an
   automatic consequence of getting the OS to lock the pages.

-- 
Don		D.C.Lindsay 	Carnegie Mellon Computer Science