idall@augean.OZ (Ian Dall) (07/27/89)
All this talk of PDP 10s aliasing memory to the first locations of memory reminded me of something I was thinking about some time ago. The tendency is to go for large register sets. Large register sets have a penalty in context switch time (and possibly in proceedure calls of seperately compiled code). Various people have claimed that with a suitable cache, memory bandwidth is "not a problem". So why not put the "registers" in memory? If the "registers" are the first n locations of the virtual address space then they will not have to be saved and restored on context switches or interrupts. There are a few ways to think of this. It is either memory mapped registers or (because these are in effect just scratchpad memory locations) it is equivalently a zero register machine. The data cache is essential to the scheme. Because the scratchpad locations are accessed frequently the probability of a cache hit is very high. The compiler would have to know something of the cache dynamics so that it could decide whether it is better to keep something in a temporary register location (which is almost certain to be a cache hit) or to access directly a possibly distant location. I suspect the problem is that there is still a significant gap between the access times of a register and of a cache location. Still, I kind of like the idea of an infinitly large demand paged register set! -- Ian Dall life (n). A sexually transmitted disease which afflicts some people more severely than others. idall@augean.oz
gnb@bby.oz (Gregory N. Bond) (07/28/89)
In article <550@augean.OZ> idall@augean.OZ (Ian Dall) writes:
So why not put the "registers" in memory? If the "registers" are the
[...]
I suspect the problem is that there is still a significant gap between
the access times of a register and of a cache location. Still, I kind
of like the idea of an infinitly large demand paged register set!
You aren't wrong. How many caches can do 2 reads/1 write per 50ns
cycle? And how much do they cost?
The other problem is instruction bits. It is easy to specify 3
addresses if each one is 5 bits wide, but much harder if they are
general addresses.
Yes, it would work, it would remove the nasty register selection phase
of the compiler, but you can make fatser machines cheaper using
registers.
Greg.
--
Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia
Internet: gnb@melba.bby.oz.au non-MX: gnb%melba.bby.oz@uunet.uu.net
Uucp: {uunet,pyramid,ubc-cs,ukc,mcvax,prlb2,nttlab...}!munnari!melba.bby.oz!gnb
slackey@bbn.com (Stan Lackey) (07/28/89)
In article <550@augean.OZ> idall@augean.OZ (Ian Dall) writes: >All this talk of PDP 10s aliasing memory to the first locations of memory >reminded me of something I was thinking about some time ago. >So why not put the "registers" in memory? [Insert generic "it's been done" message here] The TI 990/9900 did this - the CPU had a 'workspace pointer' which pointed to R0, the base of the register file, hopefully in a section of main memory that was implemented in Schottky RAM. Thus, a context switch would be a matter of loading only a few registers. Actually, register windowing could be simulated with this facility. This would be tough to get performance, because you need real fast access to the register file, preferably multi-port AND real fast access. Sure, the current register set could be moved in and out, but that defeats the original purpose. Now would 40,000 other readers mind adding more architectures to the list? Preferably really old ones. :-) -Stan
baum@Apple.COM (Allen J. Baum) (07/29/89)
[] >In article <550@augean.OZ> idall@augean.OZ (Ian Dall) writes: > >All this talk of PDP 10s aliasing memory to the first locations of memory >reminded me of something I was thinking about some time ago. > ........ >So why not put the "registers" in memory? If the "registers" are the >first n locations of the virtual address space then they will not have >to be saved and restored on context switches or interrupts. Check out the ATT CRISP processor for an implementation of this idea. -- baum@apple.com (408)974-3385 {decwrl,hplabs}!amdahl!apple!baum
werme@Alliant.COM (Ric Werme) (08/01/89)
In article <43445@bbn.COM> slackey@BBN.COM (Stan Lackey) writes: >The TI 990/9900 did this - the CPU had a 'workspace pointer' which >pointed to R0, the base of the register file, hopefully in a section >of main memory that was implemented in Schottky RAM. > >Now would 40,000 other readers mind adding more architectures to the >list? Preferably really old ones. :-) How about the Zilog Z8 (not old enough). It has 124 bytes of register space and a "Register Pointer" that lets you define one of the 8 blocks of registers as working registers which lets you get to them with smaller, faster instructions. The 16-bit instructions generally require a pair of working registers for the address. Nice machine for peripherals, modems, etc. I set up different register blocks with data for each interrupt routine, so the interrupt code merely had to save RP. An architectural issue - the Z8 convinced me that one of the key criteria in choosing a processor (or designing one) is how general purpose the general purpose registers are. The Z8 is an eight bit machine, even more so than the Z80. However, a 16 bit add was just two instructions: add r4,r8 adc r5,r9 If you needed 32 bits, you need two more instructions: adc r6,r10 adc r7,r11 Special purpose register architectures like the Z80 (or [gack] 6502) are a pain to extend. Even the 68000's dichotomy between the address and data registers causes problems. A Japanese company made a "68200" which was patterned after the 68000 but with more general purpose registers. We found that we could write smaller, faster code for it instead of the 68000 thanks largely to not having to shuffle data between the A and D set. Of course (how can I let it slip by?), the PDP-10 is a general purpose register machine. I could write an instruction on it and feel as though I had accomplished something. Writing "mov d0,a0" makes me wish I had a PDP-11! -- | A pride of lions | Eric J Werme | | A gaggle of geese | uucp: decvax!linus!alliant | | An odd lot of programmers | Phone: 603-673-3993 |
cliff@ficc.uu.net (cliff click) (08/01/89)
In article <43445@bbn.COM>, slackey@bbn.com (Stan Lackey) writes: > The TI 990/9900 did this - the CPU had a 'workspace pointer' which > pointed to R0, the base of the register file, hopefully in a section > of main memory that was implemented in Schottky RAM. Thus, a context > > This would be tough to get performance, because you need real fast Doesn't TI still make/market this chip? I thought the later versions implemented the registers as a write-thru cache (bleed-thru??) - they were just as fast a "normal" registers otherwise. On a context switch you just swapped the 'workspace pointer' and the cache mechanism loaded registers as needed (and saved as needed). I thought it was a good chip that never caught on because TI can't market (at least not like *BM & *ntel). -- Cliff Click, Software Contractor at Large Business: uunet.uu.net!ficc!cliff, cliff@ficc.uu.net, +1 713 274 5368 (w). Disclaimer: lost in the vortices of nilspace... +1 713 568 3460 (h).
albaugh@dms.UUCP (Mike Albaugh) (08/04/89)
From article <3301@alliant.Alliant.COM>, by werme@Alliant.COM (Ric Werme): > In article <43445@bbn.COM> slackey@BBN.COM (Stan Lackey) writes: >>The TI 990/9900 did this - the CPU had a 'workspace pointer' which >>pointed to R0, the base of the register file, hopefully in a section >>of main memory that was implemented in Schottky RAM. >> >>Now would 40,000 other readers mind adding more architectures to the >>list? Preferably really old ones. :-) Funny you should ask :-) > Special purpose register architectures like the Z80 (or [gack] 6502) are a > pain to extend. If one considers the 256 locations that constitute "zero-page" on the 6502 to _be_ its registers: a) It qualifies for the question above. b) it is very regular c) The "x" register can now serve to reference a (albeit bounded) register "stack" d) one will produce quite good code with substantially less pain than before this leap of intution. Seriously, the 6502 is a bit quirky, but many processors (incl the 6502) get a whole lot easier to program once one "gets" certain keys to their intended use. > Even the 68000's dichotomy between the address and data > registers causes problems. No kidding :-( > Of course (how can I let it slip by?), the PDP-10 is a general purpose register > machine. I could write an instruction on it and feel as though I had > accomplished something. Writing "mov d0,a0" makes me wish I had a PDP-11! Especially since mov d0,a0 is almost never what you mean :-( (explanation, most assemblers default a move with no size spec to be "word" aka 16 bits, but move to an A register sign-extends. Gotcha :-( ) Mike (we use a _lot_ of 6502s and 68010s here, but then a lot of our customers are cheapskates :-) | Mike Albaugh (albaugh@dms.UUCP || {...decwrl!turtlevax!}weitek!dms!albaugh) | Atari Games Corp (Arcade Games, no relation to the makers of the ST) | 675 Sycamore Dr. Milpitas, CA 95035 voice: (408)434-1709 | The opinions expressed are my own (Boy, are they ever)
peter@ficc.uu.net (Peter da Silva) (08/04/89)
In article <3301@alliant.Alliant.COM>, werme@Alliant.COM (Ric Werme) writes: > In article <43445@bbn.COM> slackey@BBN.COM (Stan Lackey) writes: > >Now would 40,000 other readers mind adding more architectures to the > >list? Preferably really old ones. :-) The 1802 has 16 16-bit registers, one 8-bit register, and two 4-bit registers. The 4-bit registers (P and X) point to the registers that will be used as PC and SP. The 8-bit register is the accumulator. All memory reads and writes go via the AC, so loading a register takes 4 instructions: Load value PLO reg (Put LOw byte) Load value PHI reg (Put HIgh byte) The subroutine call involved changing the P register to a new register with the new PC in it, and changing back to the old register to return. To save space a standard call and return technique was set up: Five register are set aside: PC Program Counter SP Stack Pointer LINK Holds top word of stack. CALL Holds call-routine address. Starts at CALLP RET Halds return-routine address. Starts at RETP Then you have the following code: CALLP: GLO LINK STXD Store via X and Decrement (push) GHI LINK STXD GLO PC PLO LINK GHI PC PHI LINK Now the old LINK is pushed, new LINK is old PC LDA LINK PLO PC LDA LINK PHI PC SEP PC Set Program counter to be PC BR CALLP Go back and do it again. RETP: GLO LINK PLO PC GHI LINK PHI PC INR SP LDXA Load via X and increment (pop) PHI PC LDX PLO PC SEP PC BR RETP Now you could make a subroutine call: SEP CALL DATA SUBR Address of subroutine, inline after instruction. > An architectural issue - the Z8 convinced me that one of the key criteria > in choosing a processor (or designing one) is how general purpose the > general purpose registers are. 1802 registers were quite general purpose :->, but the Forth inner interpreter loop was *faster* than a normal subroutine call! It would have helped if LDXA was LDAX instead... -- Peter da Silva, Xenix Support, Ferranti International Controls Corporation. Business: peter@ficc.uu.net, +1 713 274 5180. | "The sentence I am now Personal: peter@sugar.hackercorp.com. `-_-' | writing is the sentence Quote: Have you hugged your wolf today? 'U` | you are now reading"
linimon@attctc.Dallas.TX.US (Mark Linimon) (08/05/89)
In article <3301@alliant.Alliant.COM>, werme@Alliant.COM (Ric Werme) writes: > A Japanese company made a "68200" which was > patterned after the 68000 but with more general purpose registers. We found > that we could write smaller, faster code for it instead of the 68000 thanks > largely to not having to shuffle data between the A and D set. Um, there may have been duplicate numbers or a cross-development situation, but I do believe that the 68200 was a Mostek part. As I recall my boss (then of the Mostek SystemGroup, now part of Mizar) worked on the cross- development tools for it on the VAX. Mark Linimon Mizar, Inc. {attctc, convex, sun!texsun}!mizarvme!linimon