[comp.sys.apple2] Adding an MMU- the whole hullabaloo

jb10320@uxa.cso.uiuc.edu (Desdinova) (12/19/90)

In article <276e74a4.7324@petunia.CalPoly.EDU> rbannon@batman.elee.CalPoly.EDU (Roy Bannon) writes:
>
>Here Here!  Lets get a positive discussion going.  
>
>I am pretty much finished with my DSP card project.  After hacking 
>a DSP card which is able to sample at 47k and has a processor on board
>that is doing 8 Mips, I'm ready for another hardware project.  I think 
>a mmu hack would be perfect.  I am game for a try anyway.  So, what does
>it need, form the software point of view that is.  I remember hearing 
>something to the effect that the abort line to the wdc65c816 doesnt work
>reliablely.  Is this true?  Might make things a bit more complicated if it
>is true.  Initial thought, how bout a daughter board that plugs into the
>65816 socket and has the 65816 and an mmu on it.  Something to think about
>anyway.  

   Hmm. Okay, there are basically two premade MMUs you can use, both are
Motorola.  The 68451 and the 68851.  The '451 is a segmentation-based
MMU that supports 32 segments.  Something like this wouldn't be too useful,
since at any one time there are MANY MANY (often around 50-100) allocated
blocks of memory.  The '851 is a full demand-paging MMU, and supports up
to 4-level page tables (not necessary for the GS- one level will suffice).
I would choose the '851.
  As for the /ABORT pin, it's true that it doesn't work properly for a few
instructions.  However, at least two of the instructions it doesn't work
for would never generate a page fault (the SEP, REP pair I think), so it
doesn't matter.  If anyone can provide in concrete the other instructions
that don't work, we can check those.  If there ARE instructions that can
cause a fault but don't abort properly, we can just KILL the process.
A small limitation, but one I'm sure we can tolerate.
  With the /ABORT stuff worked out, you can implement full demand paging.
The page device should ideally be a fast hard drive, but Slinky RAM cards
and PC Transporter RAM cards would also make good ones, for a more cost-minded
approach. But it's likely anyone putting out for the MMU board and the 
multitasking software will have a HD.  If not, it's just a limitation,
but not a deadly one.
  Now we have to attack the software that will manage all this.  There are
again two choices: fixing GS/OS, and porting Unix V7 (I picked V7 since it's
what many people consider to be the last reasonable version of Unix, and
also it's what Minix is based on.)  Fixing GS/OS would be an incredibly
useful thing.  Imagine GS/OS never crashing again- NDAs would never again
kill the system.  Instead of purging memory handles, paging them out
(reliably, without the process having to know about it as per the earlier
discussions about swapping purgeable blocks).  Imagine a reliable switcher,
even a MultiFinder.  Imagine your Mac friends being insanely envious of
the incredible capabilities unleashed by such a simple change, one THEY
can't make.  But I digress.
  Porting Unix has its benefits- the wealth of PD software.  But much of
this software is becoming available as Orca/C gets better and better.
Oh, I forgot to mention that with an MMU development becomes a snap-
your program dies, you know instantly where and what went wrong. The only
benefit Unix would have over GS/OS is... well, actually there are none.
GS/OS is an amazing operating system.  Be happy about it.

  Now to hardware constraints.  With the (hopeful) arrival of the ASIC
25MHz processor, people will be upgrading their accelerators to use the
new super-chip.  The MMU must work with both the Transwarp and the Zip GS.
There are two ways to do this.

1)   processor -> cache -> MMU -> motherboard
2)   processor -> MMU -> cache -> motherboard

The second approach is the one most often used by designers these days.
However, it's possible it will be difficult or impossible (due to timing
requirements) to do it that way.  The first method works, but would
require a way to flush the cache, which may not be possible, and besides
would mean heavy performance losses with the large cache sizes the Transwarp
and the Zip GS have.
Fortunately, the '851 is extremely fast, and by virtue that the page tables
would be prime entries in the cache, reasonably performing.  It will require
much cooperation from AE and the Zip people to make this work.
(BTW, putting the MMU in a non-accelerated environment would be cake,
but not nearly as efficient.  But it WOULD work ok, the '851 has a page
table cache).

This has been pretty long, and some of you probably don't know what the hell
I'm saying. What I'm saying is
1) We can have incredible benefits from an MMU
2) It won't be easy, but it is VERY doable.

To do this, I need time (I can get that) and people (this depends on you).

--
Jawaid Bazyar               | Being is Mathematics 
Senior/Computer Engineering | Love is Chemistry
jb10320@uxa.cso.uiuc.edu    | Sex is Physics
   Apple II Forever!        | Babies are engineering

rhyde@ucrmath.ucr.edu (randy hyde) (12/21/90)

>>   Hmm. Okay, there are basically two premade MMUs you can use, both are
>>Motorola.  The 68451 and the 68851.  The '451 is a segmentation-based

Actually, Motorola actually makes another MMU for the 6809 which might
work.  I believe the problem with the Abort pin is that certain instructions
are not restartable.  In any case, the abort occurs before one process can
damage another.  If you're willing to give up virtual memory (you want
VM on a GS?  Think about that!), the abort pin should work fine.  If you must
have VM, you're in trouble.  A page fault on REP/SEP can get you into
trouble if the first byte is in one bank and the second in another and you
cannot properly restart the instruction (e.g., suppose it changes M & X
*BERFORE* your abort routine saves their status.  How do you know how to
restore them?)
Certainly a paged MMU is the way to go.  The hybrid paged/segmentation
system on the 386/486 is very powerful.  Having segments (in software) and
paging on a GS would clean up several memory allocation problems.
*** Randy Hyde
.

ericmcg@pnet91.cts.com (Eric Mcgillicuddy) (12/23/90)

>have VM, you're in trouble.  A page fault on REP/SEP can get you into
>trouble if the first byte is in one bank and the second in another and you

This will never happen, unless the assembler/compiler screws up. The Program
counter is only 16 bits and would wrap around rather than incrementing the K
register. Code must remain within the same 64k bank of memory.

Do you mean Page (in the 4k context)?

Perhaps, given that the program counter is saved on /ABORT, it is possible to
manipulate its value so that it begins the instruction from the start, rather
than continuing half way through.

This brings up another about atomic instructions. Suppose FLAG were is a
swapped segment, how would ASL FLAG work?                            ^^in

UUCP: bkj386!pnet91!ericmcg
INET: ericmcg@pnet91.cts.com

toddpw@nntp-server.caltech.edu (Todd P. Whitesel) (12/23/90)

ericmcg@pnet91.cts.com (Eric Mcgillicuddy) writes:

>Perhaps, given that the program counter is saved on /ABORT, it is possible to
>manipulate its value so that it begins the instruction from the start, rather
>than continuing half way through.

Abort is supposed to save the PC of the beginning of the instruction.
one problem solved.

>This brings up another about atomic instructions. Suppose FLAG were is a
>swapped segment, how would ASL FLAG work?                            ^^in

simple, the read cycle traps. abort works fine and the segment is swapped
in before anything nasty has a chance to happen.

the instruction that could give you trouble is an ROL/ROR to a read-only
segment.

Todd Whitesel
toddpw @ tybalt.caltech.edu

ericmcg@pnet91.cts.com (Eric Mcgillicuddy) (12/25/90)

>>Perhaps, given that the program counter is saved on /ABORT, it is possible
to
>>manipulate its value so that it begins the instruction from the start,
rather
>>than continuing half way through.
>
>Abort is supposed to save the PC of the beginning of the instruction.
>one problem solved.

I'm afraid that I don't understand, it seems different from what I know about
the 65xxx series. Usually the opcode is fetched and the PC is incremented and
then 0-3 subsequent fetches and increments are performed. If any of these are
in a swapped segment then there is trouble. Are you saying that the PC is
saved in an internal temporay register, just in case, and that it is restored
once the /ABORT interupt is finished? (with the first RTI I assume)

Is this function also responsible for the VPA and VDA (Valid Program/Data
Adress) signals coming from the chip?

Rotating a ROM location would be a program bug and should fail on that count.
What does it do that makes it even worse?

UUCP: bkj386!pnet91!ericmcg
INET: ericmcg@pnet91.cts.com

rhyde@ucrmath.ucr.edu (randy hyde) (12/29/90)

>> Do you mean Page (in the 4k context)?

Yep.  Or whatever page size you prefer.  I think that 64K is a little
large for a PMMU.  You'd spend all day swapping pages.

>> Suppose FLAG were in a swapped segment, how would ASL Flag work?

ASL is a R-M-W operation.  On the read you'd get an abort.  After swapping
in the appropriate page, the memory manager would *restart* the ASL instr.
allowing it to finish properly.  Since ASL has *not* yet modified the flag
(only does so on the write operation) things are okay.