[comp.arch] Coroutine switching

zs01+@andrew.cmu.edu (Zalman Stern) (06/21/89)

(I'm a bit behind in comp.arch. This posting is in regard to doing
coroutines or lightweight processes on register window'ed machines.)

As the person who wrote the MIPS R2000 and Sun SPARC assembly code for
our lightweight process (LWP) package, I can say a little bit about
this. The R2000 port took about 2 hours, the SPARC port took 2 days. If
nothing else, register windows are harder to understand.

The main problem with the SPARC was understanding how things work, like
the fact that the frame pointer and the stack pointer are in the same
place for adjacent windows. That is, sp in the current window before a
save instruction is fp in the current window after the save. Also, one
must understand what the kernel does to flush windows. Basically, if the
kernel ever has to flush the window, it stores that window at the
address in that windows fp register. It is very important that fp always
be correct, otherwise a UNIX context switch could store the window in to
a random place. This was the hardest part of debugging the code, at
random times a register window would be stored into the middle of
another process' stack. Also, the kernel keeps track of which windows a
user context and which are kernel context. This is so that the kernel
can flush all user context only when necessary.

SunOS provides a trap to flush all the windows to the stack. Combined
with the fact that the kernel never restores more than one window at a
time, this is all you need to write a coroutine package. If the kernel
"prefetched" register windows, you would have a problem of flushing the
windows, then on the next instruction UNIX context switching, then
coming back with more than one window loaded. This is a problem because
that window will get flushed later which will write over some other
LWP's stack...

I also looked at doing this for the AMD 29000. At least there,
everything is in user space and I wouldn't have had to worry about what
the kernel was doing. I think that would have been reasonably easy,
although one still wants a trap to save all the registers. That way, the
register saving code can occur once in the entire system and it might
even be in the cache. This might be important when you need code to
potentially save 128 registers.

Also, I don't think comments on context switching apply in these
situations. If register saving is only 5% of your context switch for an
LWP package, then you have done something dreadfully wrong. The two
register window implementations I know of (SPARC and AMD 29k) both hedge
their bets on this one by letting you bust up the register file into
pieces and dedicating each piece to a single LWP. Then all you have to
do is write code to handle cacheing of contexts in the register file and
flushing/restoring them when necessary. This wouldn't be too hard and
would probably have great performance. However, in the case of the
SPARC, SunOS 4.0 and the Sun compilers make this impossible.

Anyway, here's the code. The routines used by this package are
savecontext and returnto. Savecontext takes a function to call, an area
in which to save state, and possibly a stack to switch to. Returnto
takes an area and returns to the context that last called savecontext
with that area. Generally, an area is just a pointer to a long that
holds a stack pointer. In the case of the SPARC, I save the global
registers into the area as well since I don't know what the calling
convention says about saving/restoring the globals. The MIPS code
assumes the assembler is going to do reordering.

savecontext(f, area1, newsp)
#     int (*f)(); struct savearea *area1; char *newsp;

# returnto(area2)
#     struct savearea *area2;

MIPS:

/* Code for MIPS R2000/R3000 architecture
 * Written by Zalman Stern April 30th, 1989.
 */
#include <regdef.h> /* Allow use of symbolic names for registers. */
#define regspace 9 * 4 + 4 + 6 * 8
#define floats 0
#define registers floats + 6 * 8
#define returnaddr regspace - 4
#define topstack 0
	.globl savecontext /* MIPS' C compiler doesn't prepend underscores. */
	.ent savecontext /* Insert debugger information. */
savecontext:
	li	t0, 1
	.extern	PRE_Block
	sb	t0, PRE_Block
	subu	sp, regspace
	.frame	sp, regspace, ra
/* Save registers. */
	sw	s0, registers + 0(sp)
	sw	s1, registers + 4(sp)
	sw	s2, registers + 8(sp)
	sw	s3, registers + 12(sp)
	sw	s4, registers + 16(sp)
	sw	s5, registers + 20(sp)
	sw	s6, registers + 24(sp)
	sw	s7, registers + 28(sp)
	sw	s8, registers + 32(sp)
/* Save return address */
	sw	ra, returnaddr(sp)
	.mask	0xc0ff0000, -4
/* Need to save floating point registers? */
	s.d	$f20, floats + 0(sp)
	s.d	$f22, floats + 8(sp)
	s.d	$f24, floats + 16(sp)
	s.d	$f26, floats + 24(sp)
	s.d	$f28, floats + 32(sp)
	s.d	$f30, floats + 40(sp)
	.fmask	0x55400000, regspace
	sw	sp, topstack(a1)
	beq	a2, $0, samestack
	addu	sp, $0, a2
samestack:
	jal	a0
	.end	savecontext

	.globl	returnto
	.ent	returnto
returnto:
	lw	sp, topstack(a0)
	lw	s0, registers + 0(sp)
	lw	s1, registers + 4(sp)
	lw	s2, registers + 8(sp)
	lw	s3, registers + 12(sp)
	lw	s4, registers + 16(sp)
	lw	s5, registers + 20(sp)
	lw	s6, registers + 24(sp)
	lw	s7, registers + 28(sp)
	lw	s8, registers + 32(sp)
/* Save return address */
	lw	ra, returnaddr(sp)
/* Need to save floating point registers? */
	l.d	$f20, floats + 0(sp)
	l.d	$f22, floats + 8(sp)
	l.d	$f24, floats + 16(sp)
	l.d	$f26, floats + 24(sp)
	l.d	$f28, floats + 32(sp)
	l.d	$f30, floats + 40(sp)
        addu    sp, regspace
	sb	$0, PRE_Block
	j	ra
	.end	returnto

SPARC:

#include	<sun4/asm_linkage.h>
#include  <sun4/trap.h>
	.data	
	.globl	_PRE_Block
topstack	= 0
globals = 4
! savecontext(f, area1, newsp)
!     int (*f)(); struct savearea *area1; char *newsp;
	.text
	.globl	_savecontext
_savecontext:
	save	%sp, -SA(MINFRAME), %sp	! Get new window
	ta	ST_FLUSH_WINDOWS		! Flush all other active windows

	/* The following 3 lines do the equivalent of: _PRE_Block = 1 */
	set	_PRE_Block, %l0
	mov	1,%l1
	stb	%l1, [%l0]

	st	%fp,[%i1+topstack]		! area1->topstack = fp
	
	st	%g1, [%i1 + globals + 0]		! Save all globals just in case
	st	%g2, [%i1 + globals + 4]
	st	%g3, [%i1 + globals + 8]
	st	%g4, [%i1 + globals + 16]
	st	%g5, [%i1 + globals + 20]
	st	%g6, [%i1 + globals + 24]
	st	%g7, [%i1 + globals + 28]
	mov	%y, %g1				! Save this in the unlikely event that its required
	st	%g1, [%i1 + globals + 32]

	cmp	%i2, 0
	be,a	L1				! if (newsp == 0) no stack switch
	nop
	add	%i2, STACK_ALIGN - 1, %i2	! SPARC requires stricter alignment than
	and	%i2, ~(STACK_ALIGN - 1), %i2	! malloc gives so I force alignment.
	sub	%i2, SA(MINFRAME), %fp
	call	%i0
	restore

L1:	call	%i0			! call f()
	nop


! returnto(area1)
!     struct savearea *area1;
	.globl _returnto
_returnto:
	ta	ST_FLUSH_WINDOWS		! Flush all other active windows
	ld	[%o0+topstack],%g1		! sp = area1->topstack
	sub	%g1, SA(MINFRAME), %fp	! Adjust sp to the right place
	sub	%fp, SA(MINFRAME), %sp

	ld	[%o0 + globals + 32], %g1		! Restore global regs back
	mov	%g1, %y
	ld	[%o0 + globals + 0], %g1
	ld	[%o0 + globals + 4], %g2
	ld	[%o0 + globals + 8], %g3
	ld	[%o0 + globals + 16], %g4
	ld	[%o0 + globals + 20], %g5
	ld	[%o0 + globals + 24], %g6
	ld	[%o0 + globals + 28], %g7

	restore					

	/* The following 3 lines do the equivalent of: _PRE_Block = 1 */
	set	_PRE_Block, %l0
	mov	0,%l1
	stb	%l1, [%l0]

	restore
	retl
	nop

Sincerely,
Zalman Stern
Internet: zs01+@andrew.cmu.edu     Usenet: I'm soooo confused...
Information Technology Center, Carnegie Mellon, Pittsburgh, PA 15213-3890