[net.micro.16k] Bus Error Effluvia

gnu@l5.uucp (John Gilmore) (09/01/85)

Personally, :-) I think Henry was just using my vacation from the net
to see if he could slip a few 68000 flames past me.  No way, Jose!

On dumping microstate during a page fault:

My first reaction was the same as Henry's -- it's not as clean as the
old IBM 360 I grew up on, so it must be bad.  Further discussions with
various chip and architecture designers led me to realize that a BIG
problem with making fast processors is stopping them dead once you
realize you need to stop.  Motorola was going for speed as usual, so
they opted to run normal instructions quickly and page faults slowly.
I've heard that the National 16032 designers had to go to a lot of
trouble both in instruction set design and in implementation, to make
sure that they could always back out an instruction if it faulted.
IBM, Amdahl, etc. have always had to dedicate a lot of (otherwise
useless) hardware to stopping all the other instructions, or backing
them out if already completed, when you get e.g. integer overflow.  The
68010 and 68020 microcode can concentrate on do as much work as
possible and very seldom have to worry about whether things would be
straight if a page fault occurred at that microstep.

For example, in the IBM 370 the block move instruction had to use four
general registers for the two addresses and two counts (well...  this
was IBM, remember) because if it took an interrupt or page fault, it
didn't want to restart from the beginning.  National did similarly.  In
the 68020 the CALLM instruction can copy a large block of arguments
from one stack to another; it doesn't need that kind of kludge because
it can afford to keep some internal state around.  Though I think
CALLM is a botch for other reasons.

Somebody mentioned that they feel creepy about the idea of the system
going on and executing more instructions after a bus error.  It's not
really that bad.  If the fault happens on a data access, it stops dead
and handles it.  If it happens on an instruction prefetch, then it
remembers that it got a fault, but goes on until it tries to execute
that instruction.  If you branch first, no fault.  This fixes a bug
wherein the 68000 would fault if your return-from-subroutine was the
last thing in a page and the next page was not executable.

In older micros "bus error" conjures up images of dead hardware and
bits of wire shorting the pins and such.  In a 680xx system, bus error
is just another way to end a bus cycle; it happens all the time.
It doesn't mean the hardware or software is broken, so there's no
reason for the CPU to stop at that instant to deal with it.
The next cycle will probably work OK even if this one got a bus error.

Tom Gunter (project mgr. or similar for 68000 family) remarked several
years ago that the reason the 68000 didn't do page faults is because he
decided it was too hard for the first release.  Remember, the VERY
first 68000 design had 16-bit data regs and 24-bit address regs.  He
needed to get the product out against Intel's "ship it first, then
design it" schedules, and he realized it needed to be a 32-bit design and
they had a lot of work to do, so he compromised on page faults.  I
think his judgement was pretty good -- if they'd shipped that first
design, it wouldn't be compatible with the 32-bit version now; if
they'd delayed another 6 months to introduction, they'd have lost the
whole show to Intel.  (Another rumor says IBM chose the 8088 for the PC
because Motorola couldn't ship 250Kchips that year -- but Intel could.
Wonder if the PC-2 will use 68020 since the 386 is not real yet...)

lat@druil.UUCP (TepperL) (09/04/85)

> ...  Further discussions with
> various chip and architecture designers led me to realize that a BIG
> problem with making fast processors is stopping them dead once you
> realize you need to stop.

Wonderful.  And I thought the computer inertia discussion had died down. :-)
-- 
Larry Tepper	    {ihnp4 | allegra}!druil!lat		+1-303-538-1759

kds@intelca.UUCP (Ken Shoemaker) (09/05/85)

My initial thought about this was "who cares if the processor doesn't backtrack
and redo the instructions after returning from the bus error," but after
thinking about it for a little while, I can think of at least two problems
that this might cause.

1) Since the processor can't get out onto the bus if it is busy either waiting
	for ready or a bus error, one wouldn't think that it could corrupt
	anything that is not recoverable.  However, it can fiddle around
	with its internal state all it wants to.  Now, if one of the things
	that it does internally is muck with some registers (a completely
	valid thing to do) that would get changed as a result of the trap
	handler, we have a problem, I would think.  This seems real special
	case, though I do remember that at least some early versions of
	4BSD paid attention to how register variables were allocated by the
	c compiler, and used this information when mixing assembly with
	c code...

2) In the same tack, I think that if Mot ever includes a data cache on their
	chip, they will have to pay special attention to the amount of
	concurrency they allow for data accesses on both the external bus
	and the internal cache bus.

So what do y'all think?  Does anyone know exactly what the thing tosses on the
bus, and whether it is possible for the trap handler to modify the return
state such as to change the results of half-executed instructions
(or whole executed instructions after the bus-error write)?
-- 
...and I'm sure it wouldn't interest anybody outside of a small circle
of friends...

Ken Shoemaker, Microprocessor Design for a large, Silicon Valley firm

{pur-ee,hplabs,amd,scgvaxd,dual,qantel}!intelca!kds
	
---the above views are personal.  They may not represent those of the
	employer of its submitter.

davet@oakhill.UUCP (Dave Trissel) (09/11/85)

In article <57@intelca.UUCP> kds@intelca.UUCP (Ken Shoemaker) writes:

>thinking about it for a little while, I can think of at least two problems
>that this might cause.
>
>1) Since the processor can't get out onto the bus if it is busy either waiting
>	for ready or a bus error, one wouldn't think that it could corrupt
>	anything that is not recoverable.  However, it can fiddle around
>	with its internal state all it wants to.  Now, if one of the things
>	that it does internally is muck with some registers (a completely
>	valid thing to do) that would get changed as a result of the trap
>       handler, we have a problem, I would think.  This seems real special
>	case, though I do remember that at least some early versions of
>	4BSD paid attention to how register variables were allocated by the
>	c compiler, and used this information when mixing assembly with
>	c code...

This problem occurs for any architecture that does not do all operations
memory-to-memory.  [Is this a plug for the TMS1600? :-)]  Consider a person
forcing a breakpoint while debugging a program.  If the debugger gains
control anywhere during an instruction sequence which uses data registers
to compute a value destined for a memory location then there's problems.

  Example: The HLL statement is I = J + K.   The psudeo assembler code is

		 LOAD  J to Reg
		 ADD   K to Reg
		 STORE Reg to I

If the hardware breakpoint occurs after the LOAD than any debugger displays
or changes to I, J or K will not give the expected result.  This relates
specifically to the case you are getting at of values being held in registers
during an interrupt.

Of course, the same would be true if instead of a breakpoint switch being
pressed the task context was stored due to a task timeout.  The problem
of valid variable access is still the same.

Any software which has to deal with multi-tasking variables on any machine
will have to take great care in handling them because of this and other
problems.  The long stack store for bus-errors on the MC68020 is just a
variation of the same theme.   That was a primary reason for the addition of
the '020 CAS (Compare and Swap) and CAS2 instructions on the '020. These
use locked bus cycles which allow changing of values without contamination
by interrupt routines.

>2) In the same tack, I think that if Mot ever includes a data cache on their
>	chip, they will have to pay special attention to the amount of
>	concurrency they allow for data accesses on both the external bus
>	and the internal cache bus.

The problems have nothing to do with caches but with deeply layered pipe
design.  And they are essentially no different for the M68000 architecture
and any other.

>So what do y'all think?  Does anyone know exactly what the thing tosses on the
>bus, and whether it is possible for the trap handler to modify the return
>state such as to change the results of half-executed instructions
>(or whole executed instructions after the bus-error write)?

Again, what's the difference between half-executed instructions and half-
executed assignment statements.  In either case the variable is untouchable.
In any case, there is a simple technique for completing the instruction
out of the bus error handler - just set the trace bit on in the Status Reg.

  --  Dave Trissel  Motorola Semiconductor Inc.  Austin
      {seismo,ihnp4}!ut-sally!oakhill!davet

kds@intelca.UUCP (Ken Shoemaker) (09/12/85)

Ok, here is an example of what I think is a problem, pardon the pidgin
assembly language.  If you have the processor in a polling loop, and you
know that either to get out of the loop you'll get a memory fault on a
write, and the status of that fault will be returned in a register by the
bus fault handler, you have to be very careful, since (if r0 is used to
pass back values)

loop:
	mov	to-fault-location,blah
	cmp	r0,1
	jnz	loop

will cause problems if the fault happens after the compare (which is
very valid, since the write is pipelined).  As far as I know, the compare
could be done even before the write bus cycle is started!  In this case,
I think all that would happen is that you would get a possibly unnecessary
write after you return from the fault handler.  Is this
important?  I haven't the foggiest, and there probably are ways around
it.  Why I think this has to do with data caches is just that you have
more internal state to work with that could possibly contain stale data
with the execution of instructions going so far beyond the qualification
of bus cycles.  The easy way out is just to not allow concurrent data
accesses to the internal data cache while there are outstanding requests
on the external bus.

Maybe I'm dense, but I don't think this is at all related to breakpoints/
etc., since these puppies will always cause the break exactly between
instructions, not leaving part of the next instruction or instructions
partially executed.  As far as I can tell, with the 68020, a bus
fault could happen in one HLL instruction, but not be signalled until
the next instruction (or the one 15 instructions down the road).
In addition, you might want to break up the
seemingly atomic code sequences that a compiler generates if you have
a bug in your code somewhere.  Regardless, when you restart the
processor, it is going to do the next instruction working with data
that you may have just modified.  It's not going to start 10 instructions
down the way and ignore any changes that you have made to the data or
the internal state that could have changed the results of the execution
of those 10 instructions.  But thinking about debuggers, doesn't
this little 68020 feature cause problems if you want to trap
on a write to memory (ala an ICE unit?)  The way I think about it,
when you trap the processor write where does the monitor say the
processor is?

Finally, this (and anything that I would say) certainly should not
be taken as a dig on the 68020.  From all I can tell, it seems a
fine machine, and a tremendous engineering accomplishment.
-- 
...and I'm sure it wouldn't interest anybody outside of a small circle
of friends...

Ken Shoemaker, Microprocessor Design for a large, Silicon Valley firm

{pur-ee,hplabs,amd,scgvaxd,dual,qantel}!intelca!kds
	
---the above views are personal.  They may not represent those of the
	employer of its submitter.

davet@oakhill.UUCP (Dave Trissel) (09/16/85)

In article <69@intelca.UUCP> kds@intelca.UUCP (Ken Shoemaker) writes:

>bus fault handler, you have to be very careful, since (if r0 is used to
>pass back values)
>
>loop:
>	mov	to-fault-location,blah
>	cmp	r0,1
>	jnz	loop
>

This is very easy for the '020 programmer to solve.  The MC68020 NOP
instruction serializes the machine (e.g. guarantees all updates for previous
instructions have been done.)  Therefore, by thowing in a NOP after the
instruction which does the write such a loop on the '020 will always
properly execute.

There are some subtle matters in regards to context validity and bus errors
but such a discussion would be lenghty and esoteric.  I can only take the
time to post such things if there is enough interest expressed on the net.

There was a long discussion about 6 months ago but I don't have it archived.
(If someone does it was in net.micro.68k.)

  --  Dave Trissel           {ihnp4,seismo}!ut-sally!oakhill!davet
      Motorola Semiconductor

sarwono@puff.UUCP (09/27/85)

> My initial thought about this was "who cares if the processor doesn't backtrack
> and redo the instructions after returning from the bus error," but after
> thinking about it for a little while, I can think of at least two problems
> that this might cause.
> 
> 1) Since the processor can't get out onto the bus if it is busy either waiting
> 	for ready or a bus error, one wouldn't think that it could corrupt
> 	anything that is not recoverable.  However, it can fiddle around
> 	with its internal state all it wants to.  Now, if one of the things
> 	that it does internally is muck with some registers (a completely
> 	valid thing to do) that would get changed as a result of the trap
> 	handler, we have a problem, I would think.  This seems real special
> 	case, though I do remember that at least some early versions of
> 	4BSD paid attention to how register variables were allocated by the
> 	c compiler, and used this information when mixing assembly with
> 	c code...
> 
> 2) In the same tack, I think that if Mot ever includes a data cache on their
> 	chip, they will have to pay special attention to the amount of
> 	concurrency they allow for data accesses on both the external bus
> 	and the internal cache bus.
> 
> So what do y'all think?  Does anyone know exactly what the thing tosses on the
> bus, and whether it is possible for the trap handler to modify the return
> state such as to change the results of half-executed instructions
> (or whole executed instructions after the bus-error write)?
> -- 
> ...and I'm sure it wouldn't interest anybody outside of a small circle
> of friends...
> 
> Ken Shoemaker, Microprocessor Design for a large, Silicon Valley firm
> 
> {pur-ee,hplabs,amd,scgvaxd,dual,qantel}!intelca!kds
> 	
> ---the above views are personal.  They may not represent those of the
> 	employer of its submitter.

*** REPLACE THIS LINE WITH YOUR MESSAGE ***