[comp.arch] RISCs Register sets and PDP 10/20s

idall@augean.OZ (Ian Dall) (07/27/89)

All this talk of PDP 10s aliasing memory to the first locations of memory
reminded me of something I was thinking about some time ago.

The tendency is to go for large register sets. Large register sets have a
penalty in context switch time (and possibly in proceedure calls of
seperately compiled code). Various people have claimed that with a suitable
cache, memory bandwidth is "not a problem".

So why not put the "registers" in memory? If the "registers" are the
first n locations of the virtual address space then they will not have
to be saved and restored on context switches or interrupts. There are
a few ways to think of this. It is either memory mapped registers or
(because these are in effect just scratchpad memory locations) it is
equivalently a zero register machine. The data cache is essential to
the scheme. Because the scratchpad locations are accessed frequently
the probability of a cache hit is very high. The compiler would have
to know something of the cache dynamics so that it could decide
whether it is better to keep something in a temporary register
location (which is almost certain to be a cache hit) or to access
directly a possibly distant location.

I suspect the problem is that there is still a significant gap between
the access times of a register and of a cache location. Still, I kind
of like the idea of an infinitly large demand paged register set!
-- 
 Ian Dall           life (n). A sexually transmitted disease which afflicts
                              some people more severely than others.
idall@augean.oz

gnb@bby.oz (Gregory N. Bond) (07/28/89)

In article <550@augean.OZ> idall@augean.OZ (Ian Dall) writes:

   So why not put the "registers" in memory? If the "registers" are the
		[...]
   I suspect the problem is that there is still a significant gap between
   the access times of a register and of a cache location. Still, I kind
   of like the idea of an infinitly large demand paged register set!

You aren't wrong.  How many caches can do 2 reads/1 write per 50ns
cycle?  And how much do they cost?

The other problem is instruction bits.  It is easy to specify 3
addresses if each one is 5 bits wide, but much harder if they are
general addresses.

Yes, it would work, it would remove the nasty register selection phase
of the compiler, but you can make fatser machines cheaper using
registers.

Greg.
--
Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia
Internet: gnb@melba.bby.oz.au    non-MX: gnb%melba.bby.oz@uunet.uu.net
Uucp: {uunet,pyramid,ubc-cs,ukc,mcvax,prlb2,nttlab...}!munnari!melba.bby.oz!gnb

slackey@bbn.com (Stan Lackey) (07/28/89)

In article <550@augean.OZ> idall@augean.OZ (Ian Dall) writes:
>All this talk of PDP 10s aliasing memory to the first locations of memory
>reminded me of something I was thinking about some time ago.
>So why not put the "registers" in memory? 

[Insert generic "it's been done" message here]

The TI 990/9900 did this - the CPU had a 'workspace pointer' which
pointed to R0, the base of the register file, hopefully in a section
of main memory that was implemented in Schottky RAM.  Thus, a context
switch would be a matter of loading only a few registers.  Actually,
register windowing could be simulated with this facility.

This would be tough to get performance, because you need real fast
access to the register file, preferably multi-port AND real fast
access.  Sure, the current register set could be moved in and out, but
that defeats the original purpose.

Now would 40,000 other readers mind adding more architectures to the
list?  Preferably really old ones.  :-)
-Stan

baum@Apple.COM (Allen J. Baum) (07/29/89)

[]
>In article <550@augean.OZ> idall@augean.OZ (Ian Dall) writes:
>
>All this talk of PDP 10s aliasing memory to the first locations of memory
>reminded me of something I was thinking about some time ago.
> ........
>So why not put the "registers" in memory? If the "registers" are the
>first n locations of the virtual address space then they will not have
>to be saved and restored on context switches or interrupts.

Check out the ATT CRISP processor for an implementation of this idea.

--
		  baum@apple.com		(408)974-3385
{decwrl,hplabs}!amdahl!apple!baum

werme@Alliant.COM (Ric Werme) (08/01/89)

In article <43445@bbn.COM> slackey@BBN.COM (Stan Lackey) writes:
>The TI 990/9900 did this - the CPU had a 'workspace pointer' which
>pointed to R0, the base of the register file, hopefully in a section
>of main memory that was implemented in Schottky RAM.
>
>Now would 40,000 other readers mind adding more architectures to the
>list?  Preferably really old ones.  :-)

How about the Zilog Z8 (not old enough).  It has 124 bytes of register
space and a "Register Pointer" that lets you define one of the 8 blocks
of registers as working registers which lets you get to them with smaller,
faster instructions.  The 16-bit instructions generally require a pair
of working registers for the address.  Nice machine for peripherals, modems,
etc.  I set up different register blocks with data for each interrupt routine,
so the interrupt code merely had to save RP.

An architectural issue - the Z8 convinced me that one of the key criteria
in choosing a processor (or designing one) is how general purpose the
general purpose registers are.

The Z8 is an eight bit machine, even more so than the Z80.  However, a 16
bit add was just two instructions:

	add	r4,r8
	adc	r5,r9

If you needed 32 bits, you need two more instructions:

	adc	r6,r10
	adc	r7,r11

Special purpose register architectures like the Z80 (or [gack] 6502) are a
pain to extend.  Even the 68000's dichotomy between the address and data
registers causes problems.  A Japanese company made a "68200" which was
patterned after the 68000 but with more general purpose registers.  We found
that we could write smaller, faster code for it instead of the 68000 thanks
largely to not having to shuffle data between the A and D set.

Of course (how can I let it slip by?), the PDP-10 is a general purpose register
machine.  I could write an instruction on it and feel as though I had
accomplished something.  Writing "mov d0,a0" makes me wish I had a PDP-11!


-- 

| A pride of lions              | Eric J Werme                |
| A gaggle of geese             | uucp: decvax!linus!alliant  |
| An odd lot of programmers     | Phone: 603-673-3993         |

cliff@ficc.uu.net (cliff click) (08/01/89)

In article <43445@bbn.COM>, slackey@bbn.com (Stan Lackey) writes:
> The TI 990/9900 did this - the CPU had a 'workspace pointer' which
> pointed to R0, the base of the register file, hopefully in a section
> of main memory that was implemented in Schottky RAM.  Thus, a context
> 
> This would be tough to get performance, because you need real fast

Doesn't TI still make/market this chip?  I thought the later versions
implemented the registers as a write-thru cache (bleed-thru??) - they 
were just as fast a "normal" registers otherwise.  On a context switch
you just swapped the 'workspace pointer' and the cache mechanism loaded
registers as needed (and saved as needed).  I thought it was a good chip
that never caught on because TI can't market (at least not like *BM & *ntel).

-- 
Cliff Click, Software Contractor at Large
Business: uunet.uu.net!ficc!cliff, cliff@ficc.uu.net, +1 713 274 5368 (w).
Disclaimer: lost in the vortices of nilspace...       +1 713 568 3460 (h).

albaugh@dms.UUCP (Mike Albaugh) (08/04/89)

From article <3301@alliant.Alliant.COM>, by werme@Alliant.COM (Ric Werme):
> In article <43445@bbn.COM> slackey@BBN.COM (Stan Lackey) writes:
>>The TI 990/9900 did this - the CPU had a 'workspace pointer' which
>>pointed to R0, the base of the register file, hopefully in a section
>>of main memory that was implemented in Schottky RAM.
>>
>>Now would 40,000 other readers mind adding more architectures to the
>>list?  Preferably really old ones.  :-)

	Funny you should ask :-)

> Special purpose register architectures like the Z80 (or [gack] 6502) are a
> pain to extend.

	If one considers the 256 locations that constitute "zero-page" on
the 6502 to _be_ its registers:

a) It qualifies for the question above.
b) it is very regular
c) The "x" register can now serve to reference a (albeit bounded) register
   "stack"
d) one will produce quite good code with substantially less pain than before
   this leap of intution.

Seriously, the 6502 is a bit quirky, but many processors (incl the 6502) get
a whole lot easier to program once one "gets" certain keys to their intended
use.

>  Even the 68000's dichotomy between the address and data
> registers causes problems.

	No kidding :-(

> Of course (how can I let it slip by?), the PDP-10 is a general purpose register
> machine.  I could write an instruction on it and feel as though I had
> accomplished something.  Writing "mov d0,a0" makes me wish I had a PDP-11!

	Especially since mov d0,a0 is almost never what you mean :-(
(explanation, most assemblers default a move with no size spec to be "word"
aka 16 bits, but move to an A register sign-extends. Gotcha :-(  )

					Mike

(we use a _lot_ of 6502s and 68010s here, but then a lot of our customers
are cheapskates :-)

| Mike Albaugh (albaugh@dms.UUCP || {...decwrl!turtlevax!}weitek!dms!albaugh)
| Atari Games Corp (Arcade Games, no relation to the makers of the ST)
| 675 Sycamore Dr. Milpitas, CA 95035		voice: (408)434-1709
| The opinions expressed are my own (Boy, are they ever)

peter@ficc.uu.net (Peter da Silva) (08/04/89)

In article <3301@alliant.Alliant.COM>, werme@Alliant.COM (Ric Werme) writes:
> In article <43445@bbn.COM> slackey@BBN.COM (Stan Lackey) writes:
> >Now would 40,000 other readers mind adding more architectures to the
> >list?  Preferably really old ones.  :-)

The 1802 has 16 16-bit registers, one 8-bit register, and two 4-bit
registers. The 4-bit registers (P and X) point to the registers that
will be used as PC and SP. The 8-bit register is the accumulator. All
memory reads and writes go via the AC, so loading a register takes 4
instructions:

	Load value
	PLO reg	(Put LOw byte)
	Load value
	PHI reg (Put HIgh byte)

The subroutine call involved changing the P register to a new register
with the new PC in it, and changing back to the old register to return.
To save space a standard call and return technique was set up:

Five register are set aside:

	PC	Program Counter
	SP	Stack Pointer
	LINK	Holds top word of stack.
	CALL	Holds call-routine address. Starts at CALLP
	RET	Halds return-routine address. Starts at RETP

Then you have the following code:

CALLP:	GLO	LINK
	STXD		Store via X and Decrement (push)
	GHI	LINK
	STXD
	GLO	PC
	PLO	LINK
	GHI	PC
	PHI	LINK	Now the old LINK is pushed, new LINK is old PC
	LDA	LINK
	PLO	PC
	LDA	LINK
	PHI	PC
	SEP	PC	Set Program counter to be PC
	BR	CALLP	Go back and do it again.
RETP:	GLO	LINK
	PLO	PC
	GHI	LINK
	PHI	PC
	INR	SP
	LDXA		Load via X and increment (pop)
	PHI	PC
	LDX
	PLO	PC
	SEP	PC
	BR	RETP

Now you could make a subroutine call:

	SEP	CALL
	DATA	SUBR	Address of subroutine, inline after instruction.

> An architectural issue - the Z8 convinced me that one of the key criteria
> in choosing a processor (or designing one) is how general purpose the
> general purpose registers are.

1802 registers were quite general purpose :->, but the Forth inner
interpreter loop was *faster* than a normal subroutine call! It would
have helped if LDXA was LDAX instead...
-- 
Peter da Silva, Xenix Support, Ferranti International Controls Corporation.
Business: peter@ficc.uu.net, +1 713 274 5180. | "The sentence I am now
Personal: peter@sugar.hackercorp.com.   `-_-' |  writing is the sentence
Quote: Have you hugged your wolf today?  'U`  |  you are now reading"

linimon@attctc.Dallas.TX.US (Mark Linimon) (08/05/89)

In article <3301@alliant.Alliant.COM>, werme@Alliant.COM (Ric Werme) writes:
> A Japanese company made a "68200" which was
> patterned after the 68000 but with more general purpose registers.  We found
> that we could write smaller, faster code for it instead of the 68000 thanks
> largely to not having to shuffle data between the A and D set.

Um, there may have been duplicate numbers or a cross-development situation,
but I do believe that the 68200 was a Mostek part.  As I recall my boss
(then of the Mostek SystemGroup, now part of Mizar) worked on the cross-
development tools for it on the VAX.

Mark Linimon
Mizar, Inc.
{attctc, convex, sun!texsun}!mizarvme!linimon