[comp.lang.c] The winner!

peter@sugar.UUCP (Peter DaSilva) (07/14/87)

> > #pragma asm PDP-11
> > 	mov	(IP)+,WP	; anybody recognise this
> > 	jmp	@(WP)+		; piece of code?
> > #pragma C

> NEXT for FORTH?
> ---
> 	G. Ralph Kuntz N2HBN	UUCP: {ihnp4,allegra}!attunix!grk
> 				ARPA: rutgers.rutgers.edu!pisc2b!grk
> 				PACKET: N2HBN @ NN2Z

Give the man a cigar. More precisely, it's the NEXT from John James'
FIG_FORTH for the PDP-11, and one of my main reasons for considering the
PDP-11 instruction set the best thing DEC ever came up with (yes, I know
the VAX is sexier, but it's evolutionary... not revolutionary).

Using direct threaded code, you can actually implement both NEXT and DOCOL
in one instruction apiece. 

NEXT:

	jmp (IP)+

DOCOL:

	jsr (IP),<first word of definition>

I challenge anyone to come up with a faster interpreter anywhere (how can
you beat one instruction?). This is actually what the threaded interpreter
for the threaded version of DEC fortran-IV used, with R4==IP.
-- 
-- Peter da Silva `-_-' ...!seismo!soma!uhnix1!sugar!peter (I said, NO PHOTOS!)

ark@alice.UUCP (07/19/87)

In article <398@sugar.UUCP>, peter@sugar.UUCP writes:
> I challenge anyone to come up with a faster interpreter anywhere (how can
> you beat one instruction?). This is actually what the threaded interpreter
> for the threaded version of DEC fortran-IV used, with R4==IP.

How about zero instructions?

Here's how.  As in ordinary threaded code, the "program" is
a string of addresses of subroutines, and the IP register
points to the spot in the "program" that contains the address
of the next thing to do.

However, at the end of each subroutine, instead of a return
instruction, you just say

	jmp	*(IP)+

That is, jump to the location addressed by the contents of the
word addressed by the contents of IP, and bump IP by one word.

Presto -- a zero-instruction interpreter loop!

henry@utzoo.UUCP (Henry Spencer) (07/20/87)

> Using direct threaded code, you can actually implement both NEXT and DOCOL
> in one instruction apiece. 
> 
> NEXT:
> 	jmp (IP)+
> 
> DOCOL:
> 	jsr (IP),<first word of definition>
> 
> I challenge anyone to come up with a faster interpreter anywhere (how can
> you beat one instruction?)...

Given that that one instruction takes 1.68 us in the first case and 3.7 in
the second, this being on a relatively fast 11 (the 44), I would say it is
pretty easy to come up with a faster interpreter on something like a MIPS
machine...

(The point being that you do *pay* for having instructions complex enough
to do such things in one instruction.)
-- 
Support sustained spaceflight: fight |  Henry Spencer @ U of Toronto Zoology
the soi-disant "Planetary Society"!  | {allegra,ihnp4,decvax,utai}!utzoo!henry

firth@sei.cmu.edu (Robert Firth) (07/21/87)

In article <8326@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
>> Using direct threaded code, you can actually implement both NEXT and DOCOL
>> in one instruction apiece. 
>> 
>> NEXT:
>> 	jmp (IP)+
>> 
>> DOCOL:
>> 	jsr (IP),<first word of definition>
>> 
>> I challenge anyone to come up with a faster interpreter anywhere (how can
>> you beat one instruction?)...
>
>Given that that one instruction takes 1.68 us in the first case and 3.7 in
>the second, this being on a relatively fast 11 (the 44), I would say it is
>pretty easy to come up with a faster interpreter on something like a MIPS
>machine...

On the M/500, the first sequence would be

	jr ip
	addiu ip,ip,4

and execute in 2 cycles. (of course, the second instruction executes
in the branch delay slot of the first).

Likewise, the second would be

	jal <first word of definition>
	sw rl,(ip)

similarly.  (The jal instruction saves the return address in rl and the
second instruction again executes in the delay slot).

On our M/500, the time taken is about 250 ns for each sequence.

Can anyone beat THAT?

tim@amdcad.UUCP (07/21/87)

In article <1946@aw.sei.cmu.edu> firth@bd.sei.cmu.edu.UUCP (PUT YOUR NAME HERE) writes:
+-----
| On our M/500, the time taken is about 250 ns for each sequence.
| 
| Can anyone beat THAT?
+-----
Now go to a faster cycle time, like the mips M/800 (62ns) or the Am29000
(40ns, simulated):

NEXT:
	jmpi	ip
	add	ip,ip,4

DOCOL:
	call	ret_addr, <first word of definition>
	store	ret_addr, ip


That's now 80ns for each sequence.  

You can also use the large register file of the Am29000 to cache the
FORTH parameter stack on-chip, so primatives like "+" can be written
like:

	plus:
		add	lr1, lr1, lr0	; perform add : nos <- nos + tos
		add	gr1, gr1, 4	; pop stack

instead of:

	plus:
		load	r0, (sp)
		load	r1, 4(sp)
		add	r1, r1, r0
		store	r1, 4(sp)
		add	sp, sp, 4

which has to go to memory 3 times.


	-- Tim Olson
	Advanced Micro Devices
	(tim@amdcad.amd.com)

jty@tut.fi (Nokari) (07/22/87)

In article <398@sugar.UUCP> peter@sugar.UUCP (Peter DaSilva) writes:
>> > #pragma asm PDP-11
>> > 	mov	(IP)+,WP	; anybody recognise this
>> > 	jmp	@(WP)+		; piece of code?
>> > #pragma C

How about

	mov	@(IP)+,PC

(And I think MOV is faster than JMP)

W-register? Who needs W-register?

.. sorry, couldn't resist..
-- 
Jyrki Yli-Nokari
Intrinsic, Ltd.  ---  "The creators of C-77, The Fortran-to-C translator"
jty@intrin.FI  - or -  YLI@FINTUTA.BITNET

pmk@hall.UUCP (07/31/87)

In article <1946@aw.sei.cmu.edu>, firth@sei.cmu.edu (Robert Firth) writes:
> In article <8326@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
> >> 
> >> NEXT:
> >> 	jmp (IP)+
> >> 
> >> DOCOL:
> >> 	jsr (IP),<first word of definition>
> >
> >Given that that one instruction takes 1.68 us in the first case and 3.7 in
> >the second, this being on a relatively fast 11 (the 44), I would say it is
> >pretty easy to come up with a faster interpreter on something like a MIPS
> >machine...
> 
> On our M/500, the time taken is about 250 ns for each sequence.
> 
> Can anyone beat THAT?

Well, yes.  Cray-2 code for NEXT is:

	a1	0
	a2	1
	a1	a1+a.IP
	a.IP	a.IP+a2
	j	a1

and DOCOL is:

	a1	<first word of defn>
	r,a.IP	a1

It's faster than the M/500, but the hardware is a little more expensive.