lee@fortune.UUCP (12/06/83)
No, I disagree that cache memory and registers are the same. Cache memory is transparent to the machine architecture, and there is no limit (except the price ) to the amount of cache memory in a machine. In order to address large number of registers, the register identifications have to be encoded in the instructions. There are definitely limits (longer instructions) in the number of registers. Ed Lee fortune!lee
jlw@ariel.UUCP (J.WOOD) (12/07/83)
There have been two microprocessors that I know about that have put the registers in regular memory; the BTL BELMAC-8 and the TI 9900. In addition I believe that the registers of some of the UNIVAC 1100 series machines either were or appeared to be in regular address space, but I could be wrong on this. I disagree that registers and cache must be architecturally different. Putting the registers in address space and then providing a cache with an access time the same as on-board registers seems to me to be a big win. The cache management algorithms available now would be ideal for the management of a register file that is as big as all outdoors. A possible improvement in performance in the case of a register file miss could be obtained by marking tha page as registers and if the register file pointer is being pushed and causes a miss then cache memory is only allocated, and the read through is omitted. Joseph L. Wood, III AT&T Information Systems Laboratories, Holmdel (201) 834-3759 ariel!jlw
andrew@orca.UUCP (Andrew Klossner) (12/08/83)
A historical note on putting registers in the address space: The PDP-10 architecture, based on 36-bit words, includes sixteen registers which occupy locations 0 through 17 (octal) in the address space. In many ways this bit of orthogonality made programming and code generation easier. In other ways, it was an open invitation to the following "clever" coding technique: moving small compute-intensive loops into the registers. As example, the TECO editor, when inserting a character into its memory buffer, had to visit successive words, shift each right 7 bits, insert a 7-bit byte from the previous word, etc. It did this with about 12 instructions, using 4 registers (numbers approximate from moldy brainware memory). The technique of putting loops in the registers proved to be successful and was adopted by a variety of assembly language programs. When the current CPU, the KL10, came out, it turned out that, with its lightning fast memory cache, fetching instructions from memory was *faster* than fetching them from the registers, and all these programs had intricately developed, carefully optimized register loops which ran slower than they would had they been left in memory. -- Andrew Klossner (decvax!tektronix!tekecs!andrew) [UUCP] (tekecs!andrew.tektronix@rand-relay) [ARPA]
mdash@mh3bc1.UUCP (12/09/83)
PDP-10 and offspring have their registers in the address space, too. Michael Scheer
b-davis@utah-cs.UUCP (Brad Davis) (12/10/83)
Since the DEC-10's and 20's have their 16 registers mapped to the first 16 words of memory a small loop could be inserted into them and would execute very fast (no memory references). The problem with this is that it TOPS-20 you could deposit data into memory and start execution from the EXEC command processor. If you put a jump to self in a register and started it executing then you can load the machine badly since there is no swapping, memory access, and the program is all ready to run. With only 1% of the machine guarrenteed I have been able to get over 30% on an overloaded machine. If the machine is not loaded then the figure has gone up to 90+%. This is not to suggest that anyone try this. I was doing this as an experiment only. I think that it is a failure of the system that allows this to happen. Brad Davis ..!harpo!utah-cs!b-davis b-davis@utah-cs.ARPA
andree@uokvax.UUCP (12/11/83)
#R:ariel:-52600:uokvax:9900001:000:692 uokvax!andree Dec 9 08:13:00 1983 Johnson (at BTL) came up with a similar scheme. One of his 32 bit designs had no registers, just lots of cache. The CPU kept pointers to memory that marked the top/bottom of the cache address. The idea was to keep the top of stack in cache, so stack-oriented languages would win big. He did studies of most of the Unix tools, and found that 80%+ (I think that is the correct figure) ran in less than 512 bytes of stack, and only one program (this number I know to be correct) took more than 1024 bytes of stack - Ritches recursive descent C compiler. This seems like a major win to me. I'd be interested in hearing from anybody who is working on such a machine, or anything similar. <mike