earl@mips.COM (Earl Killian) (05/24/88)
(See previous postings for background.) (Thanks to Andrew Klossner for his help on this one.) > Architecture Reference Where is the architecture fully described? -- Technical Summary: 32-Bit Concurrent RISC Microprocessor (27-page data sheet) -- MC78000 User's Manual Revision 0.4, October 7, 1987, Advanced Information (100+ page document, describes registers, instructions, exception processing, and timing information in detail; it has no doubt been renamed by now) -- Technical Summary: 32-Bit Cache/Memory Management Unit (CMMU) (19-page data sheet) -- MC78200 Cache and Memory Management Unit (CMMU) Architecture Spec. version 2.0, November 3, 1986, Advanced Information (80-page document, describes pre-production CMMU in detail) -- MC78200 User's Manual Revision 0.1, November 29, 1987, Advanced Information (80+ page document, like above but includes architecture changes which will appear in the production chip) > Peak native MIPS What is the clock cycle time? 20MHz (50ns) What is the peak native MIPS rate? 20mips > Implementation technology What are the parameters of the implementation technology? 1.5micron CMOS How many chips of what kinds to build a cpu subsystem? 1 88100 2-8 88200s How many pins on those chips? Each chip is in a 17 pin by 17 pin package, 181 pins apiece. > Instruction format What instruction sizes are used? 32 bits What size are immediate operands? 16 bits What size are branch displacements? 16 bits (+-128KB) What size are unconditional branch and call displacements? 26 bits (+-128MB) > Integer Registers How are the registers organized [simple, windowed]? simple How many total integer registers? 32 32-bit registers Hardwired zero register? yes, r0 4 registers reserved for linker > Integer Alu What is the logical latency/issue/repeat? 1/1/1 What is the shift latency/issue/repeat? 1/1/1 What is the add latency/issue/repeat? 1/1/1 What is the compare latency/issue/repeat? 1/1/1 How is 64 bit (signed/unsigned) integer addition supported and how many cycles? An "addu.co" instruction followed by an "add.ci" or "addu.ci" instruction. Each is 1/1/1 for a total of 2/2/2. > Branches Which operand comparisons are implemented in the conditional branch instruction, and which require a separate instruction? branch instructions: = 0, != 0, > 0, < 0, >= 0, <= 0 bit set, bit clear Everything else requires a separate compare instruction. Where is the result of separate comparisons stored [registers, condition codes]? registers Which forms of branch delay are present in instruction set [execute N if no branch, execute N if branch, execute N always]? execute 1 always and execute 1 if no branch What are the taken and not-taken cycle counts for each branch type, not including the N delayed instructions, if executed? execute 1 always: 1 cycle, taken or not execute 1 if no branch: 1 cycle untaken, 2 cycles taken > Loads/Stores What addressing mode(s) do load instructions use? register + 16-bit unsigned displacement register + register register + register*size What addressing mode(s) do store instructions use? same Which load/store sizes are supported [8, 16, 32, 64]? 8, 16, 32, 64 What is the load latency/issue/repeat? 3/1/1 for 8-32, 4/2/2 for 64 What is the store latency/issue/repeat? 1/1/1 for 8-32, 2/2/2 for 64 > Integer Multiply/Divide How is multiply is implemented [software, multiply step, hardware]? hardware How many cycles to perform 32x32->32 multiply? 4/1/1 How is divide is implemented [software, divide step, hardware]? hardware How many cycles to perform 32x32->32 divide? 39/1/39 Signed divide traps on negative operand. How is 32x32->64 bit integer multiplication supported and how many cycles? Software. No cycle count estimate. How is 64/32->32,32 bit integer division supported and how many cycles? Software. No cycle count estimate. > Floating Point Are floating point registers separate from integer registers? no How many 32-bit floating point registers? 32 How many 64-bit floating point registers? 16 How many 80-bit floating point registers? 0 How is floating point is implemented [software, coprocessor, on-chip]? on-chip What are the floating point operation latency/issue/repeats? 32-bit 64-bit 80-bit add 5/ 1/ 1 6/ 2/ 2 n.a. mul 5/ 1/ 1 10/ 2/ 2 n.a. div 30/ 1/30 60/ 2/60 n.a. sqrt n.a. n.a. n.a. Which floating point units can operate in parallel? add and multiply Can floating point operate in parallel with integer? yes Are floating point exceptions precise? some but not all > Memory management Page size in bytes? 4096 How many bits in a virtual address? 32 What is the size of the user-mode address space? 4G There can be two user-mode address spaces, each 4G, if you want to split I&D. How many bits in a physical address? 32 How many bits of address space id are added to virtual addresses, if any? 0 Translation cache [none, off-chip, in-cache, on-chip]? in-cache Translation cache size in entries? 56 Translation cache associativity [direct-mapped, 2-set, 4-set, full]? full Translation cache miss handled by [software, hardware]? hardware Also 10 512Kbyte software-managed translation entries. > Caches Instruction cache [none, off-chip, on-chip]? off-chip Data cache [none, off-chip, on-chip]? off-chip Are I and D caches separate? yes I-cache total size in bytes? 16K to 64K I-cache associativity [direct-mapped, 2-set, 4-set, fully associative]? 4-set I-cache address block size in bytes (bytes per tag)? 16 I-cache transfer block size in bytes (bytes read on cache miss)? 16 I-cache index [virtual, physical]? virtual The distinction only matters when there is more than one CMMU on a memory port. When there's just one, the index is both virtual and physical. I-cache tag [virtual, physical]? physical D-cache total size in bytes? 16K to 64K D-cache associativity [direct-mapped, 2-set, 4-set, fully associative]? 4-set D-cache writes [write-through, write-back]? write-through or write-back D-cache address block size in bytes (bytes per tag)? 16 D-cache transfer block size in bytes (bytes read on cache miss)? 16 D-cache index [virtual, physical]? virtual See comment for I-cache index. D-cache tag [virtual, physical]? physical Is there a secondary cache? no > Branch Prediction What form of branch prediction is used, if any? none > Other Describe other unique or interesting features of the architecture or its implementation. E.g. describe the functional units, with emphasis on non-standard units. There are four 32-bit scratch "control" registers available in supervisor mode. There's a user-writable "floating point control register" with bits like "disable divide-by-zero exception", "disable overflow exception", and so on. The bits are not interpreted by the hardware; the exception always occurs, and it's up to the kernel to fix up the imprecise result and make it appear to the user as though the exception hadn't occurred. The kernel does all the right IEEE things, including implementing not-a-number. There's an instruction to trap on subscript out of range. A bit in the PSR selects whether the data space is big-endian or little-endian. The instruction and data pipelines are exposed to software. Exception handling involves a lot of overhead; the code has to deal with up to six outstanding user page faults and up to nine outstanding floating point exceptions. You can't just duck in and out of a device interrupt routine and then return with RTE. -- UUCP: {ames,decwrl,prls,pyramid}!mips!earl USPS: MIPS Computer Systems, 930 Arques Ave, Sunnyvale CA, 94086
tom@nud.UUCP (Tom Armistead) (05/26/88)
In article <2232@gumby.mips.COM> earl@mips.COM (Earl Killian) writes: > >Describe other unique or interesting features of the architecture or >its implementation. >E.g. describe the functional units, with emphasis on non-standard >units. Some other features I think were missed in the original posting: It has bit field instructions. (I'm not aware of other RISC processors with this feature). Support for multiprocessing via cache coherency features of cache chip. Support for fault tolerant applications. >The instruction and data pipelines are exposed to software. Exception >handling involves a lot of overhead; the code has to deal with up to >six outstanding user page faults and up to nine outstanding floating Actually only 5 page faults and handling just 4 of them is sufficient and optimum for performance. >point exceptions. You can't just duck in and out of a device interrupt >routine and then return with RTE. You can write the exception handlers to process the interrupt first (if interrupt latency is important) before any page faults/FP exceptions are handled. The page faults/FP exception handling doesn't have to be done on every interrupt either - only those faults that occur simultaneously with the interrupt need to be handled simultaneously. Simultaneous exceptions/ interrupts are relatively rare in comparison to interrupts that occur without any other pending exceptions. The FP exceptions can also be ignored until RTE if needed. However, this means your interrupt handler cannot use FP (including integer multiply and divide). If you can make this restriction (or guarantee no FP exceptions) and if you can guarantee no page faults, you can indeed "duck in and out and then RTE" with an 88K interrupt handler. How about a comp.sys.m88k (or equivalent)? -- Just a few more bits in the stream. The Sneek
df@nud.UUCP (Dale Farnsworth) (05/26/88)
Earl Killian (earl@mips.COM) writes: > (Thanks to Andrew Klossner for his help on this one.) I am pleased that Earl and Andrew made this available to the net. > > Architecture Reference > Where is the architecture fully described? > > -- MC78000 User's Manual Revision 0.4, October 7, 1987, Advanced > Information (100+ page document, describes registers, > instructions, exception processing, and timing information in > detail; it has no doubt been renamed by now) Also marked Motorola Confidential/Proprietary. The most recent version is MC88100 User's Manual Revision 0.6, April 6, 1988, Advanced Information. > -- MC78200 User's Manual Revision 0.1, November 29, 1987, Advanced > Information (80+ page document, like above but includes > architecture changes which will appear in the production chip) Current version: MC88200 User's Manual Revision 0.4 Preliminary Copy, April 19, 1988, Advanced Information. > The instruction and data pipelines are exposed to software. This could be misunderstood. The pipelines are only exposed to the exception handler. Hardware register scoreboarding is used by the chip so compilers are *not required* to do pipeline instruction scheduling. Again, thanks for the excellent information. -Dale -- Dale Farnsworth 602-438-3092 uunet!unisoft!nud!df
brooks@lll-crg.llnl.gov (Eugene D. Brooks III) (05/27/88)
In article <798@nud.UUCP> tom@nud.UUCP (Tom Armistead) writes: > Support for multiprocessing via cache coherency features of cache Lets hear about these very IMPORTANT features in detail. Anyone have this information available to them?
tom@nud.UUCP (Tom Armistead) (06/02/88)
>> Support for multiprocessing via cache coherency features of cache >Lets hear about these very IMPORTANT features in detail. Anyone have >this information available to them? In brief: The 88200 chip contains logic that allows it to monitor the activities of other 88200s in the system (including other processor's 88200s). If an 88200 attempts to access a location in memory which does not contain valid data (i.e. its "real" contents are in cache), then the 88200 containing the correct data will preempt the access and update main memory. The first 88200 will then continue the access and get the correct data. This is referred to as "snooping" and is performed by the 88200 chips themselves - software is required to take no action (other than configuring the 88200's in snoop mode) to maintain cache coherency between multiprocessors. "Snooping" takes a large burden off of the work required to implement a multiprocessing system. Of course semaphoring is supported as well. I could post lots more but at this time have to limit my comments to that information which has been released to the public. The above is also discussed in the Technical summary of the MC88200. -- Just a few more bits in the stream. The Sneek