peter@sugar.UUCP (Peter DaSilva) (07/14/87)
> > #pragma asm PDP-11 > > mov (IP)+,WP ; anybody recognise this > > jmp @(WP)+ ; piece of code? > > #pragma C > NEXT for FORTH? > --- > G. Ralph Kuntz N2HBN UUCP: {ihnp4,allegra}!attunix!grk > ARPA: rutgers.rutgers.edu!pisc2b!grk > PACKET: N2HBN @ NN2Z Give the man a cigar. More precisely, it's the NEXT from John James' FIG_FORTH for the PDP-11, and one of my main reasons for considering the PDP-11 instruction set the best thing DEC ever came up with (yes, I know the VAX is sexier, but it's evolutionary... not revolutionary). Using direct threaded code, you can actually implement both NEXT and DOCOL in one instruction apiece. NEXT: jmp (IP)+ DOCOL: jsr (IP),<first word of definition> I challenge anyone to come up with a faster interpreter anywhere (how can you beat one instruction?). This is actually what the threaded interpreter for the threaded version of DEC fortran-IV used, with R4==IP. -- -- Peter da Silva `-_-' ...!seismo!soma!uhnix1!sugar!peter (I said, NO PHOTOS!)
ark@alice.UUCP (07/19/87)
In article <398@sugar.UUCP>, peter@sugar.UUCP writes: > I challenge anyone to come up with a faster interpreter anywhere (how can > you beat one instruction?). This is actually what the threaded interpreter > for the threaded version of DEC fortran-IV used, with R4==IP. How about zero instructions? Here's how. As in ordinary threaded code, the "program" is a string of addresses of subroutines, and the IP register points to the spot in the "program" that contains the address of the next thing to do. However, at the end of each subroutine, instead of a return instruction, you just say jmp *(IP)+ That is, jump to the location addressed by the contents of the word addressed by the contents of IP, and bump IP by one word. Presto -- a zero-instruction interpreter loop!
henry@utzoo.UUCP (Henry Spencer) (07/20/87)
> Using direct threaded code, you can actually implement both NEXT and DOCOL > in one instruction apiece. > > NEXT: > jmp (IP)+ > > DOCOL: > jsr (IP),<first word of definition> > > I challenge anyone to come up with a faster interpreter anywhere (how can > you beat one instruction?)... Given that that one instruction takes 1.68 us in the first case and 3.7 in the second, this being on a relatively fast 11 (the 44), I would say it is pretty easy to come up with a faster interpreter on something like a MIPS machine... (The point being that you do *pay* for having instructions complex enough to do such things in one instruction.) -- Support sustained spaceflight: fight | Henry Spencer @ U of Toronto Zoology the soi-disant "Planetary Society"! | {allegra,ihnp4,decvax,utai}!utzoo!henry
firth@sei.cmu.edu (Robert Firth) (07/21/87)
In article <8326@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes: >> Using direct threaded code, you can actually implement both NEXT and DOCOL >> in one instruction apiece. >> >> NEXT: >> jmp (IP)+ >> >> DOCOL: >> jsr (IP),<first word of definition> >> >> I challenge anyone to come up with a faster interpreter anywhere (how can >> you beat one instruction?)... > >Given that that one instruction takes 1.68 us in the first case and 3.7 in >the second, this being on a relatively fast 11 (the 44), I would say it is >pretty easy to come up with a faster interpreter on something like a MIPS >machine... On the M/500, the first sequence would be jr ip addiu ip,ip,4 and execute in 2 cycles. (of course, the second instruction executes in the branch delay slot of the first). Likewise, the second would be jal <first word of definition> sw rl,(ip) similarly. (The jal instruction saves the return address in rl and the second instruction again executes in the delay slot). On our M/500, the time taken is about 250 ns for each sequence. Can anyone beat THAT?
tim@amdcad.UUCP (07/21/87)
In article <1946@aw.sei.cmu.edu> firth@bd.sei.cmu.edu.UUCP (PUT YOUR NAME HERE) writes: +----- | On our M/500, the time taken is about 250 ns for each sequence. | | Can anyone beat THAT? +----- Now go to a faster cycle time, like the mips M/800 (62ns) or the Am29000 (40ns, simulated): NEXT: jmpi ip add ip,ip,4 DOCOL: call ret_addr, <first word of definition> store ret_addr, ip That's now 80ns for each sequence. You can also use the large register file of the Am29000 to cache the FORTH parameter stack on-chip, so primatives like "+" can be written like: plus: add lr1, lr1, lr0 ; perform add : nos <- nos + tos add gr1, gr1, 4 ; pop stack instead of: plus: load r0, (sp) load r1, 4(sp) add r1, r1, r0 store r1, 4(sp) add sp, sp, 4 which has to go to memory 3 times. -- Tim Olson Advanced Micro Devices (tim@amdcad.amd.com)
jty@tut.fi (Nokari) (07/22/87)
In article <398@sugar.UUCP> peter@sugar.UUCP (Peter DaSilva) writes: >> > #pragma asm PDP-11 >> > mov (IP)+,WP ; anybody recognise this >> > jmp @(WP)+ ; piece of code? >> > #pragma C How about mov @(IP)+,PC (And I think MOV is faster than JMP) W-register? Who needs W-register? .. sorry, couldn't resist.. -- Jyrki Yli-Nokari Intrinsic, Ltd. --- "The creators of C-77, The Fortran-to-C translator" jty@intrin.FI - or - YLI@FINTUTA.BITNET
pmk@hall.UUCP (07/31/87)
In article <1946@aw.sei.cmu.edu>, firth@sei.cmu.edu (Robert Firth) writes: > In article <8326@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes: > >> > >> NEXT: > >> jmp (IP)+ > >> > >> DOCOL: > >> jsr (IP),<first word of definition> > > > >Given that that one instruction takes 1.68 us in the first case and 3.7 in > >the second, this being on a relatively fast 11 (the 44), I would say it is > >pretty easy to come up with a faster interpreter on something like a MIPS > >machine... > > On our M/500, the time taken is about 250 ns for each sequence. > > Can anyone beat THAT? Well, yes. Cray-2 code for NEXT is: a1 0 a2 1 a1 a1+a.IP a.IP a.IP+a2 j a1 and DOCOL is: a1 <first word of defn> r,a.IP a1 It's faster than the M/500, but the hardware is a little more expensive.