[comp.arch] An idea I'm kicking around

ccplumb@watnot.UUCP (04/15/87)

  I've been dreaming up A RISCy architecture in my spare time, and in the
course of trying to minimize the number of memory accesses per instruction,
I ran into the problem of handling JSR's.  A branch already requires an
extra fetch to fill the pipeline, and adding a stack push would make things
ugly.

What if JSR moved the return address into another register?  If the register
was R0 (A register hardwired with the constant 0), you'd have a JMP.  To nest
JSR's, the called procedure would need to save this register, but it needs to
save registers for locals anyway, so it shouldn't be too much of a hassle.
As far as a compiler is concerned, the return register is just another reg
that's trashed by all function calls and needs to be restored before exit.

Also, if you were really tricky, you could nest a few levels without saving
any registers at all.  Caller and Callee need to agree on which register is
used to pass the return address, but if the calling sequence is known, you
can set up a different convention for every level of nesting.

I suppose if register windows were used, this idea would correspond to putting
the PC into the moveable window.

I'me sending this out for comment.  Are there any really serious bugs in it?
Would it be really wonderful?  Thanks for any advice.
--
	-Colin Plumb (watmath!watnot!ccplumb)

Silly quote:
There's a flaw in the ointment.

mason@tmsoft.UUCP (04/16/87)

In article <12884@watnot.UUCP> watmath!watnot!ccplumb (Colin Plumb) writes:
>
>  I've been dreaming up A RISCy architecture in my spare time, and in the
>course of trying to minimize the number of memory accesses per instruction,
>I ran into the problem of handling JSR's.  A branch already requires an
>extra fetch to fill the pipeline, and adding a stack push would make things
>ugly.
>
>What if JSR moved the return address into another register?  If the register

I'm not sure if this is currently in use by anyone, but I've also thought of
it in a RISCy stack machine (no, I don't think a conflict in terms) that I've
been doing thought experiments with.

I don't think you want the generality of being able to save the return in
ANY register (like you can do with the PDP11 (although with a stack push)).
I foresaw 2 different save places.  This would allow the compiler to not
have to save the return register if the routine didn't call anything else
(with 2 return registers, compiler generated calls (like structure copy)
could use the second call/return pair)

This may also be advantageous for the stack machine I was thinking of
because the result on the TOS doesn't interfere with the return address
(you can get some of the advantage of a data stack+return stack with only one
real stack).
-- 
	../Dave Mason,	TM Software Associates	(Compilers & System Consulting)
	..!{utzoo seismo!mnetor utcsri utgpu lsuc}!tmsoft!mason

henry@utzoo.UUCP (Henry Spencer) (04/16/87)

> What if JSR moved the return address into another register? ...

As I recall, this is exactly what the call instruction on the original
Berkeley RISC designs does.

> ... Are there any really serious bugs in it?

None that are obvious.

> Would it be really wonderful? ...

I don't know about "really wonderful", but it seems a sensible thing to do.
-- 
"We must choose: the stars or	Henry Spencer @ U of Toronto Zoology
the dust.  Which shall it be?"	{allegra,ihnp4,decvax,pyramid}!utzoo!henry

rwa@auvax.UUCP (Ross Alexander) (04/17/87)

In article <12884@watnot.UUCP>, ccplumb@watnot.UUCP writes:
>   I've been dreaming up A RISCy architecture in my spare time, and [...]
> I ran into the problem of handling JSR's.
> [ ... ]
> What if JSR moved the return address into another register?
> 	-Colin Plumb (watmath!watnot!ccplumb)

As all old Waterloo MFCF hackers know ( :-) please!  ) that's the
way that Honeywell 6000's do (did?)  things - the JSR analogue
was an instruction called Transfer-Set-indeX (tsx) which jumped
and dropped the return address out into an index register of
your choice.  So if you called subr A via 'tsx 1,a' and A called
B via 'tsx 2,b' then the returns would be 'tra 0,2' and 'tra
0,1' in that order without any stack operations.  The fly was
that there were only 8 index registers, so one ended up using
one (the B compiler used index reg 7 (?)) to act as a stack
pointer, and doing loads and stores to fake pushing and poping
the return addresses.  Grotty.

Of course, in hand written assembler (William Ince's APL
interpreter comes to mind) this trick worked very well.  But I
wouldn't care to maintain that code.  

Anyway, I think life is simpler with a conventional stack and
JSR/RTS instructions.

...!alberta!auvax!aubade!rwa  Ross Alexander, Athabasca University

louis@auvax.UUCP (Louis Schmittroth) (04/17/87)

The TSX was _the_ way to call a subroutine on the IBM-704, as I recall,
but the last time I coded one was about 27 years ago.  The 704 surely
belongs in the hall of fame as one of the very successful vacuum tube
scientific and engineering computers in the 1950's.

bcase@amdcad.AMD.COM (Brian Case) (04/18/87)

In article <137@auvax.UUCP> rwa@auvax.UUCP (Ross Alexander) writes:
>In article <12884@watnot.UUCP>, ccplumb@watnot.UUCP writes:
>> What if JSR moved the return address into another register?
>
>As all old Waterloo MFCF hackers know ( :-) please!  ) that's the
>way that Honeywell 6000's do (did?)  things - the JSR analogue
>was an instruction called Transfer-Set-indeX (tsx) which jumped
>and dropped the return address out into an index register of
>your choice.
>
>Anyway, I think life is simpler with a conventional stack and
>JSR/RTS instructions.

Just as a data point, the Am29000 call (and call-indirect, the only
two "call" instructions) instruction has an 8-bit destination field
for specifying the general register into which the return address is
placed.  This is good in that it allows the user to define the most
appropriate procedure-call mechanism for his particular environment.
(Actually, not many people will design unique procedure-call mechanisms,
but the flexibility is important for some special applications.)

A conventional stack and jsr/rts instruction set is great as long as
it does what you want.  As soon as the match isn't right, the trouble
begins:  you're stuck with what the machine gives you.  The good ol'
calls instruction on the VAX is nice but it is too slow because it
does too much.

    bcase

mash@mips.UUCP (John Mashey) (04/19/87)

In article <12884@watnot.UUCP> watmath!watnot!ccplumb (Colin Plumb) writes:
>
>
>  I've been dreaming up A RISCy architecture in my spare time, and in the
>course of trying to minimize the number of memory accesses per instruction,
>I ran into the problem of handling JSR's.  A branch already requires an
>extra fetch to fill the pipeline, and adding a stack push would make things
>ugly.
>
>What if JSR moved the return address into another register?  ....

MIPS R2000's do this.  It works fine.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

robison@uiucdcsb.cs.uiuc.edu (04/20/87)

In article <12884@watnot.UUCP>, ccplumb@watnot.UUCP writes:
>   I've been dreaming up A RISCy architecture in my spare time, and [...]
> I ran into the problem of handling JSR's.
> [ ... ]
> What if JSR moved the return address into another register?
> 	-Colin Plumb (watmath!watnot!ccplumb)

The PC/RT processor chip stores the return address in a register.  This 
method is quite simple, since upon entry to most subroutines, the processor
saves a block of registers on the stack anyway.  In the case of a leaf 
procedure (a procedure with no embedded procedure calls), the processor
can avoid the save.

Arch D. Robison
University of Illinois at Urbana-Champaign

CSNET: robison@UIUC.CSNET
UUCP: {ihnp4,pur-ee,convex}!uiucdcs!robison
ARPA: robison@B.CS.UIUC.EDU (robison@UIUC.ARPA)

gerryg@laidbak.UUCP (Gerry Gleason) (04/20/87)

In article <136@tmsoft.UUCP> mason@tmsoft.UUCP (Dave Mason) writes:
>I'm not sure if this is currently in use by anyone, but I've also thought of
>it in a RISCy stack machine (no, I don't think a conflict in terms) that I've
>been doing thought experiments with.

There is one I know of, CRISP (C Reduced Instruction Set Processor) at
AT&T.  I did some work on the kernel for a experimental prototype of
this machine.  It's basically a risc processor with a top of stack cache.  
It seemed like a good architecture, but I don't know what they are doing
with it.

gerry gleason

firth@sei.cmu.edu (Robert Firth) (04/20/87)

In article <12884@watnot.UUCP> watmath!watnot!ccplumb (Colin Plumb) writes:
>  I've been dreaming up A RISCy architecture in my spare time...

>What if JSR moved the return address into another register?  If the register
>was R0 (A register hardwired with the constant 0), you'd have a JMP.  To nest
>JSR's, the called procedure would need to save this register, but it needs to
>save registers for locals anyway, so it shouldn't be too much of a hassle.
>As far as a compiler is concerned, the return register is just another reg
>that's trashed by all function calls and needs to be restored before exit...

As someone who has implemented several languages on several machines,
perhaps my thoughts might be helpful.  In the majority of the codegenerators
I've written, the first instruction of a procedure retrieves the return
link from the place where the hardware put it.  For example, the CA LSI-2
stores the return link inline before the called routine, so if you want
recursion or reentrancy you've got to move it.  The PDP-11 puts it on
the SP stack, so if you want to allocate local variables towards high
addresses you have to pop it off again or grow two stacks.  And so on.

The machines I've liked best stored the return link in a register.  Not
just for that reason; in addition they have both been very clean pieces
of hardware (thanks, Perkin-Elmer, for the PE3200; thanks, MIPS, for MIPS),
but one aspect of that cleanliness is that they don't try to tell
language implementors how to think.

You definitely have my vote for using a register.

Another issue is the right operand of the JSR.  Most machines seem to
use the "effective address" as the operand, so whereas LOAD F will fetch
the VALUE in F, JSR F will jump to the ADDRESS of F. I have never liked
this.  You lose nothing, and gain a lot, by evaluating the operand in
Rmode, so that JSR #F calls F, JSR F calls the thing pointed to by F,
and JSR Reg calls the thing whose value has been computed in the register.
But this is an eccentric view.

baum@apple.UUCP (Allen J. Baum) (04/21/87)

--------
[]
The Bell Labs CRISP call instruction saves the return address in the stack,
which happens to be in the registers because of their stack cache.

The HP Spectrum call instructions put the return address into any of the GPRs.

I think even the old PDP-6/10/20 had instructions that did exactly that.

{decwrl,hplabs,ihnp4}!nsc!apple!baum		(408)973-3385

paul@unisoft.UUCP (Paul Campbell) (04/21/87)

In article <136@tmsoft.UUCP> mason@tmsoft.UUCP (Dave Mason) writes:
>I'm not sure if this is currently in use by anyone, but I've also thought of
>it in a RISCy stack machine (no, I don't think a conflict in terms) that I've
>been doing thought experiments with.


	The INMOS Transputer is also a RISCy stack machine (it only has 3
stack registers but they claim that that is all you need for most
expressions (it also has 2-4k of 50ns on chip RAM .....)

		Paul Campbell
		..!ucbvax!unisoft!paul

kenny@uiucdcsb.UUCP (04/21/87)

/* Written 12:52 pm  Apr 20, 1987 by firth@sei.cmu.edu in uiucdcsb:comp.arch */
In article <12884@watnot.UUCP> watmath!watnot!ccplumb (Colin Plumb) writes:

>  I've been dreaming up A RISCy architecture in my spare time...
>What if JSR moved the return address into another register?

Then firth@sei.cmu.edu replies:
[arguments for storing return address for branches in register]
<Another issue is the right operand of the JSR.  Most machines seem to
<use the "effective address" as the operand, so whereas LOAD F will fetch
<the VALUE in F, JSR F will jump to the ADDRESS of F. I have never liked
<this.  You lose nothing, and gain a lot, by evaluating the operand in
<Rmode, so that JSR #F calls F, JSR F calls the thing pointed to by F,
<and JSR Reg calls the thing whose value has been computed in the register.
<But this is an eccentric view.

Not really; if you look at the PDP-11 architecture, it appears that a
jump is in fact a move-immediate to the program counter.  But why not
just treat the program counter as another register, that happens to be
used in auto-increment mode by the hardware?  The subroutine linkage
operations would expand to two-instruction pairs, with a RISC-y
flavor:

	Call			Return
	MOVE	PC, Rn		ADD	#<size of jump>, Rn
	MOVE	#<subr>, PC	MOVE	Rn, PC

If you have three-address operations, it's simpler:

	Call					Return
	ADD	#<size of jump>, PC, Rn		MOVE	Rn, PC
	MOVE	#<subr>, PC

It's probably worthwhile combining the operations in the hardware,
because procedure linkage is so expensive, *provided*, of course, that
the hardware designer can do it cheaply.  It's really nice, though,
being able to use the PC as a general register -- I can't think of
applications where I'd want to multiply or divide by it, but a
load-direct from memory is useful for branch tables and the like,
while having it available as an operand really cleans up
position-independent coding.

phil@osiris.UUCP (Philip Kos) (04/21/87)

In article <1061@aw.sei.cmu.edu>, firth@sei.cmu.edu (Robert Firth) writes:
> In article <12884@watnot.UUCP> watmath!watnot!ccplumb (Colin Plumb) writes:
> 
> >What if JSR moved the return address into another register?
>
> .... The PDP-11 puts it on
> the SP stack, ....

Yeah, well, not if you tell it not to.  As I recall, the PDP-11 JSR
instruction allows you to specify *any* of the eight "GPRs" to hold the
current PC contents, after pushing the old value of the specified register
onto the stack.  (Among other purposes, this was used [with R5] for
passing the address of an in-line FORTRAN parameter address block - the
scheme was referred to in RT-11 manuals as "subroutine linkage".)

And to think I was feeling left out back when everyone was discussing the
old IBM machines in comp.misc, just because I never wrote assembly language
on anything more primitive than a 370!  I guess I'm not as young as I
thought I was... :-)

-- 
                              ...!decvax!decuac -
Phil Kos                                          \
The Johns Hopkins Hospital    ...!seismo!mimsy  - -> !aplcen!osiris!phil
Baltimore, MD                                     /
                              ...!allegra!mimsy -

"And you'll be my duchess, my duchess of prunes!" - F. Zappa

utterback@husc4.HARVARD.EDU (Brian Utterback) (04/21/87)

The original poster was wondering whether anyone still used the method of 
storing the return address of a jmp in a general register.  I can answer yes.
The Cray-2 has as its equivalent of "JSR" the instruction
"r,Ai  Ak"
Which  branches to the address held in register Ak and stores the return
address in Ai.

Brian Utterback
The above opinions are not really held by anyone, especially my employer.

watson@convexs.UUCP (04/22/87)

I believe the Berkeley RISC I and RISC II chips did just this.
See the excellent doctoral thesis by Katavenis (spelling?)
from UCB. This thesis won the ACM doctoral award a year or two ago.

jfh@killer.UUCP (04/22/87)

Changing JSR/BSR/CALL etc to a save-pc-and-jump instruction is OLD.  Probably
older than me.  The first time I saw it was in IBM land.  Correct me if I am
wrong about the System 360, bu calls with it used a Branch And Link instruction
that did exactly what was described.

The PDP-11 family did weird things with JSR.  The format was JSR func,reg.
The current value of REG was stacked (viola - recursion is born), the current
PC is saved in REG, and the address of FUNC is loaded into the PC.  Note that
	JSR	FUNC, PC
Does exactly what (well, similiar) JSR FUNC does in Vax/MC68000/J-Random Chip.
You return with RTS reg, where (unless you are into co-routines or weird
results) reg is the same one used in the JSR.  The microcode for the RTS 
instruction moves REG into PC, to restore the return address (note that this
is a no-op if REG is PC) and then pops the stack into REG (this is where
the return address gets loaded if REG == PC).

My favorite uPC of all times is the RCA CDP1802.  It was popular (?) long
before the 8088, had 16 registers, all 16 bits and was CMOS.  The best
part was the it was a truely general purpose register machine.  Any one
register could be made into the PC by some instruction (whose name I forgot),
and the SP could also be changed.

- John.		(jfh@killer.UUCP)

Disclaimer:
	No disclaimer.  Whatcha gonna do, sue me?

jfh@killer.UUCP (04/22/87)

In article <1061@aw.sei.cmu.edu>, firth@sei.cmu.edu (Robert Firth) writes:
> The machines I've liked best stored the return link in a register.  Not
> just for that reason; in addition they have both been very clean pieces
> of hardware (thanks, Perkin-Elmer, for the PE3200; thanks, MIPS, for MIPS),
> but one aspect of that cleanliness is that they don't try to tell
> language implementors how to think.
> 
> You definitely have my vote for using a register.
> 
> Another issue is the right operand of the JSR.  Most machines seem to
> use the "effective address" as the operand, so whereas LOAD F will fetch
> the VALUE in F, JSR F will jump to the ADDRESS of F. I have never liked
> this.  You lose nothing, and gain a lot, by evaluating the operand in
> Rmode, so that JSR #F calls F, JSR F calls the thing pointed to by F,
> and JSR Reg calls the thing whose value has been computed in the register.
> But this is an eccentric view.

I read someone's complaint about the vax CALLS/CALLG instructions in an
article I read after posting my reply to the original article, so I
apologize for posting again.

DEC has been doing things the *right* way since the early 70's with the
PDP-11 and later with the Vax.  JSR on a PDP-11 saves the return address
in a register.  JSR on a Vax (same for a PDP-11, but the Vax has more
and harder to figure out modes :-)  takes a dozen or so addressing modes
(the illegal ones are the most fun (what does JSR (R15) _really_ do?)

Trivia question - what does TSTW -(R15) do and is it legal?

Other thoughts I thought of...

Motorola does what DEC does (kinda) with the M68000 family.  Except for
the abundance of modes.  JMP (A0,D0) is my favorite code to generate for
switch's and JSR (A0,D0) probably has some equally handy usages (like
device or file system switches (see "Unlinking "." in comp.bugs.sys5)).

Yes, I am all for saving the return address in a register, makes interpreters
and such much easier - co-routines and other non-standard constructions
work much better also.  I am also for having a way to stack the return
address automatically (And Guy Harris thought the PDP-11 was really dead ...
Here is yet another not-yet-dead feature ...)  The only problem with the
DEC approach is that it can't be pipelined too much since the branch address
and the old PC may both want to be loaded into the PC at the same time.
Of course, this is a problem with the uCode coder.

- John.		(jfh@killer.UUCP)

Disclaimer:
	No disclaimer.  Whatcha gonna do, sue me?

greg@utcsri.UUCP (04/28/87)

In article <1061@aw.sei.cmu.edu> firth@bd.sei.cmu.edu.UUCP (Robert Firth) writes:
>Another issue is the right operand of the JSR.  Most machines seem to
>use the "effective address" as the operand, so whereas LOAD F will fetch
>the VALUE in F, JSR F will jump to the ADDRESS of F. I have never liked
>this.  You lose nothing, and gain a lot, by evaluating the operand in
>Rmode, so that JSR #F calls F, JSR F calls the thing pointed to by F,
>and JSR Reg calls the thing whose value has been computed in the register.
>But this is an eccentric view.

You do lose something. If the 68000 JSR worked this way, e.g., there
would be no way to JSR relative to the PC. JMP is effectively
LEA <address>,PC and JSR is the same thing with a push. So do you think
LEA is useless?

Even if you don't need a PC-relative address, it is often more efficient
than an absolute one.

So what do you gain? Of course you can indirect one level further,
but does that gain you a lot? It doesn't with CISC addressing modes,
anyway.

The PDP-11 provides other jump options: MOV ...,PC which is an Rmode
jump, and ADD ...,PC which is a relative jump with general-address offset.

As for 'JSR Reg', this is illegal on the 68K and PDP, where it is
replaced for free by 'JSR (Reg)'. The analogous op on an NS32k is
JSR 0(reg), with one byte for the 0, since non-indexed register indirection
is not provided. You can say JSR reg, which is the same thing for
different reasons: a register in an address context is assumed to
contain the address.

-- 
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...