[net.arch] The CLIPPER Microprocessor

kissell@spar.UUCP (Kevin Kissell) (10/08/85)

[]
	Since the Fairchild CLIPPER (tm of Fairchild Camera and Instrument
and all that) microprocessor is now public knowledge, it seems only fair to 
post a quick description of the architecture to the net.

	The CLIPPER CPU is a 32-bit microprocessor, implemented in 2-micron
CMOS.  It is not really a RISC chip, but it does directly execute all of its 
instructions without any microcode interpretation.  It is a load-store machine,
with a linear 32-bit address space.  There are two sets of 16 general-purpose
32-bit registers; one set for the user, and one for supervisory (kernel) mode,
plus a set of 8 double-precision floating-point registers.  Integer and IEEE-
compatible floating-point execution units operate as parallel functional units,
both being fed by a two-stage (not counting the IB) instruction pipe.  The 
integer execution unit itself is pipelined in three stages, and instructions 
may be executing concurrently at all three.  Integer ALU operations and 
register transfers require only a single cycle at each stage, and since the 
CPU runs on a single-phase 33MHz clock, I guess we could say it has a burst 
execution rate of 33 Mips.  Somewhat more realistically, on the Patterson 
benchmarks it averages out to between 4 and 8 Mips, depending on the benchmark.
We tend to think of CLIPPER as a 5 Mips machine.

	The companion chip to the CLIPPER CPU is the "CAMMU" or cache/MMU.
The CPU chip has a Harvard-like dual-bus architecture, and putting CAMMUs 
on both the data and instruction buses allows the pipelined CPU to fetch
instructions and access data at the same time.  Each cache is a 4K real-
address cache, two-way set-associative, and organized as 256 16-byte 
("quadword") lines.  In instruction mode, the cache autonomously prefetches 
instructions from 32-bit memory in quadword bursts.  For data, the cache can 
employ either write-through or copy-back policies, selectable on a per-page 
basis by a field in the page table entry.  The MMUs provide distinct user 
and supervisor maps, with 4K pages, using a two-level lookup scheme.  There 
is a translation-lookaside buffer (TLB), also two-way set-associative, which 
caches up to 128 translations in each CAMMU.  The CAMMU's interface to system 
memory is a multiplexed, synchronous, 32-bit bus, clocked at either 1/2 or 
1/4 of the CPU clock.

Kevin D. Kissell
Fairchild Advanced Processor Division
uucp: {ihnp4 decvax}!decwrl!\
                             >spar!kissell
    {ucbvax sdcrdcf}!hplabs!/