[net.arch] risc registers vs. cache memory

lee@fortune.UUCP (12/06/83)

No, I disagree that cache memory and registers are the same.
Cache memory is transparent to the machine architecture,
and there is no limit (except the price ) to the amount 
of cache memory in a machine.  In order to address large
number of registers, the register identifications have to
be encoded in the instructions.  There are definitely 
limits (longer instructions) in the number of registers.

	Ed Lee
	fortune!lee

jlw@ariel.UUCP (J.WOOD) (12/07/83)

There have been two microprocessors that I know about
that have put the registers in regular memory; the
BTL BELMAC-8 and the TI 9900.  In addition I believe
that the registers of some of the UNIVAC 1100 series
machines either were or appeared to be in regular
address space, but I could be wrong on this.  I
disagree that registers and cache must be architecturally
different.  Putting the registers in address space
and then providing a cache with an access time
the same as on-board registers seems to me to
be a big win.  The cache management algorithms
available now would be ideal for the management
of a register file that is as big as all outdoors.
A possible improvement in performance in the case
of a register file miss could be obtained by
marking tha page as registers and if the register
file pointer is being pushed and causes a miss
then cache memory is only allocated, and the
read through is omitted.



					Joseph L. Wood, III
					AT&T Information Systems
					Laboratories, Holmdel
					(201) 834-3759
					ariel!jlw

andrew@orca.UUCP (Andrew Klossner) (12/08/83)

A historical note on putting registers in the address space:

The PDP-10 architecture, based on 36-bit words, includes sixteen
registers which occupy locations 0 through 17 (octal) in the address
space.  In many ways this bit of orthogonality made programming and
code generation easier.  In other ways, it was an open invitation to
the following "clever" coding technique: moving small compute-intensive
loops into the registers.

As example, the TECO editor, when inserting a character into its memory
buffer, had to visit successive words, shift each right 7 bits, insert
a 7-bit byte from the previous word, etc.  It did this with about 12
instructions, using 4 registers (numbers approximate from moldy
brainware memory).

The technique of putting loops in the registers proved to be successful
and was adopted by a variety of assembly language programs.  When the
current CPU, the KL10, came out, it turned out that, with its lightning
fast memory cache, fetching instructions from memory was *faster* than
fetching them from the registers, and all these programs had
intricately developed, carefully optimized register loops which ran
slower than they would had they been left in memory.

  -- Andrew Klossner   (decvax!tektronix!tekecs!andrew)      [UUCP]
                       (tekecs!andrew.tektronix@rand-relay)  [ARPA]

mdash@mh3bc1.UUCP (12/09/83)

PDP-10 and offspring have their registers in the address space, too.

Michael Scheer

b-davis@utah-cs.UUCP (Brad Davis) (12/10/83)

Since the DEC-10's and 20's have their 16 registers mapped to the
first 16 words of memory a small loop could be inserted into them
and would execute very fast (no memory references).  The problem 
with this is that it TOPS-20 you could deposit data into memory and 
start execution from the EXEC command processor.  If you put a jump 
to self in a register and started it executing then you can load the
machine badly since there is no swapping, memory access, and the 
program is all ready to run.  With only 1% of the machine guarrenteed
I have been able to get over 30% on an overloaded machine.  If the 
machine is not loaded then the figure has gone up to 90+%.  This is 
not to suggest that anyone try this.  I was doing this as an experiment
only.  I think that it is a failure of the system that allows this to 
happen.


					Brad Davis
					..!harpo!utah-cs!b-davis
					b-davis@utah-cs.ARPA

andree@uokvax.UUCP (12/11/83)

#R:ariel:-52600:uokvax:9900001:000:692
uokvax!andree    Dec  9 08:13:00 1983

Johnson (at BTL) came up with a similar scheme. One of his
32 bit designs had no registers, just lots of cache. The CPU
kept pointers to memory that marked the top/bottom of the
cache address.

The idea was to keep the top of stack in cache, so stack-oriented
languages would win big. He did studies of most of the Unix tools,
and found that 80%+ (I think that is the correct figure) ran in less
than 512 bytes of stack, and only one program (this number I know to
be correct) took more than 1024 bytes of stack - Ritches recursive
descent C compiler.

This seems like a major win to me. I'd be interested in hearing from
anybody who is working on such a machine, or anything similar.

	<mike