neal@uikku.tut.fi (Norwitz Neal) (11/09/90)
I have a recursive algorithm that I would like to run as fast as possible. I am running a part of it on a SPARCstation SLC. But at this rate, it might finnish in a couple of years!! I was wondering if someone could suggest a couple of architectures that may suit this recursive algorithm better than others. The program takes no memory (about 12K). So all I need is speed. There are no floating point operations, except a couple of increments. Also, any information about speed ratings (in mips) from many different computers would be appreciated. Thanks a lot. Neal Norwitz neal@tut.fi Tampere University of Technology
gillies@m.cs.uiuc.edu (11/09/90)
> I have a recursive algorithm that I would like to run as fast as > possible. I am running a part of it on a SPARCstation SLC. But at > this rate, it might finnish in a couple of years!! I was wondering if > someone could suggest a couple of architectures that may suit this > recursive algorithm better than others. Oh that's easy. Any RISC BESIDES the SPARC should probably run much faster. In a recent benchmark, I found that, on deeply recursive code the SPARCstation 1 was 2.8 times faster than 16Mhz 68020/Mac II, but 6 times faster on non-recursive code. SPARC seems to suck the big wazoo on deep recursion, presumably because of its register-windowing design.
tr@samadams.princeton.edu (Tom Reingold) (11/09/90)
In article <1990Nov8.212412.14317@funet.fi> neal@uikku.tut.fi (Norwitz Neal) writes: $ $ I have a recursive algorithm that I would like to run as fast as possible. I am running a part of it on a SPARCstation SLC. But at this rate, it might finnish in a couple of years!! I was wondering if someone could suggest a couple of architectures tha $ t may suit this recursive algorithm better than others. The program takes no memory (about 12K). So all I need is speed. There are no floating point operations, except a couple of increments. $ $ Also, any information about speed ratings (in mips) from many different computers would be appreciated. You pose a vague question, so I don't know what we (the net) can suggest. If it's really going to take years with the current algorithm, it sounds worthwhile changing it to an iterative one. Have you looked at lisp systems? Some of them handle recursion pretty efficiently. I am not talking about lisp running under UNIX. I am talking about machines with lisp semantics in the processor. -- Tom Reingold tr@samadams.princeton.edu OR ...!princeton!samadams!tr "Warning: Do not drive with Auto-Shade in place. Remove from windshield before starting ignition."
blc@mentat.COM (Bruce Carneal) (11/10/90)
For tail calls at least, the responsibility/blame for excessive window overflows lies with the compiler, not the architecture. Hopefully the newer compilers for the SPARC do/will handle tail calls without stack growth.
seanf@sco.COM (Sean Fagan) (11/11/90)
In article <3300209@m.cs.uiuc.edu> gillies@m.cs.uiuc.edu writes: >SPARC seems to suck the big wazoo >on deep recursion, presumably because of its register-windowing >design. More to the point, *any* processor with register windows is not going to do too well on deeply recursive code. There are tricks a programmer can do to get rid of some recursion, and some that the compiler can do, of course. Some people at Berkeley did some studies with their RISC chip and LISP; since LISP is generally very recursive, and the Berkeley people seem to like register windows, those are probably worth reading. -- -----------------+ Sean Eric Fagan | "*Never* knock on Death's door: ring the bell and seanf@sco.COM | run away! Death hates that!" uunet!sco!seanf | -- Dr. Mike Stratford (Matt Frewer, "Doctor, Doctor") (408) 458-1422 | Any opinions expressed are my own, not my employers'.
khb@chiba.Eng.Sun.COM (Keith Bierman fpgroup) (11/11/90)
In article <8662@scolex.sco.COM> seanf@sco.COM (Sean Fagan) writes: .... >SPARC seems to suck the big wazoo >on deep recursion, presumably because of its register-windowing >design. More to the point, *any* processor with register windows is not going to do too well on deeply recursive code. .... Lest we forget, the SPARC ISA provides register windows, but their use is up to sw conventions (viz. CALL does not result in a window spin). -- ---------------------------------------------------------------- Keith H. Bierman kbierman@Eng.Sun.COM | khb@chiba.Eng.Sun.COM SMI 2550 Garcia 12-33 | (415 336 2648) Mountain View, CA 94043
spot@WOOZLE.GRAPHICS.CS.CMU.EDU (Scott Draves) (11/11/90)
In article <334@mentat.COM>, blc@mentat.COM (Bruce Carneal) writes: |> Hopefully the newer compilers |> for the SPARC do/will handle tail calls without stack |> growth. It is my understanding that the SunOS 4.x compiler does detect and eliminate tail recursion. This is verified by compiling (-O4 -S) ENUM(i) { if (i==0) return 1; else return ENUM(i-1); } and seeing _ENUM: L77001: tst %o0 bne L77003 nop retl add %g0,1,%o0 L77003: b L77001 dec %o0 It seems odd that it would generate a branch to another branch. can anyone who knows comment? Consume Scott Draves Be Silent spot@cs.cmu.edu Die
Bruce.Hoult@actrix.co.nz (Bruce Hoult) (11/12/90)
>More to the point, *any* processor with register windows is not going >to do too well on deeply recursive code. I don't know much about this, but it seems to me that a machine without register windows is oiggoing to have to put those values on the stack in any case. A machine with lots of registers that have to be stored and loaded in deep recursion should at least be able to take advantage of any burst mode that the memory subsystem supports, thus giving an advantage over normal architectures.
firth@sei.cmu.edu (Robert Firth) (11/13/90)
In article <1990Nov11.102907.7706@cs.cmu.edu> spot@WOOZLE.GRAPHICS.CS.CMU.EDU (Scott Draves) writes: >It is my understanding that the SunOS 4.x compiler does >detect and eliminate tail recursion. This is verified by >compiling (-O4 -S) > >ENUM(i) { if (i==0) return 1; else return ENUM(i-1); } > >and seeing > >_ENUM: >L77001: > tst %o0 > bne L77003 > nop > retl > add %g0,1,%o0 >L77003: > b L77001 > dec %o0 > > >It seems odd that it would generate >a branch to another branch. can anyone >who knows comment? As a preliminary note: the program mentioned by the original poster does not lend itself to tail-recursion elimination, so this won't work for him. Another option, since the code is fairly short, would be to hack an assembler version that didn't use the register windows (as was rightly pointed out, they are optional, not an ineradicable part of the call protocol). I actually tried that on Ackermann's Function (what else?) and got a factor of 7 speedup. Turning to the above code - yes there does seem to be a defect in the low-level optimiser here. There are problems in eliding branch chains on machines with delay slots, but that doesn't apply above: the delay slot of the BNE is empty, so one could close the chain and move the decrement of o0 to the label destination. However, one can do better. Since register o0 is dead on the fall-through (it is destroyed by the add), one can float up both the instructions at L77003: bne L77001 dec %o0 thus closing the chain and filling the delay slot. This reduces the five-instruction loop to three instructions, for a 65% speed up.
henry@zoo.toronto.edu (Henry Spencer) (11/14/90)
In article <8662@scolex.sco.COM> seanf@sco.COM (Sean Fagan) writes: >>SPARC seems to suck the big wazoo >>on deep recursion, presumably because of its register-windowing >>design. > >More to the point, *any* processor with register windows is not going to do >too well on deeply recursive code. Actually, the AMD 29k ought to do all right. The problem is not register windows vs. recursion, it is *fixed-size* register windows vs. recursion. SPARC ends up saving and restoring a lot of registers that aren't actually in use, because the allocation quantum is 16 registers. On the 29k the quantum is 2 registers (1 if you don't care about being floating-point compatible with the 29050) and it should almost never be necessary to save and restore unused registers. -- "I don't *want* to be normal!" | Henry Spencer at U of Toronto Zoology "Not to worry." | henry@zoo.toronto.edu utzoo!henry
preston@ariel.rice.edu (Preston Briggs) (11/14/90)
In article <1990Nov12.054723.18015@actrix.co.nz> Bruce.Hoult@actrix.co.nz (Bruce Hoult) writes: >>More to the point, *any* processor with register windows is not going >>to do too well on deeply recursive code. >I don't know much about this, but it seems to me that a machine without >register windows is oiggoing to have to put those values on the stack The point is _fixed-size_ register windows. A window-less machine might have to save 3 registers per call. A fixed-size window machine might save 16 or 32 registers per call. Even with burtst-mode saves, the volume will overwhelm the cache, and perhaps virtual memory. Preston
lindsay@gandalf.cs.cmu.edu (Donald Lindsay) (11/14/90)
In article <1990Nov13.200051.12329@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >In article <8662@scolex.sco.COM> seanf@sco.COM (Sean Fagan) writes: >>More to the point, *any* processor with register windows is not going to do >>too well on deeply recursive code. >The problem is not register >windows vs. recursion, it is *fixed-size* register windows vs. recursion. >SPARC ends up saving and restoring a lot of registers that aren't actually >in use, because the allocation quantum is 16 registers. That's a reasonable argument, but not the only one. First, compilers for non-window machines try to dribble values out to memory, avoiding bursts that could cause processor stalls. (The infamous VAX "calls" instruction writes a burst, and that seriously stalled the VAX 11/780.) Obviously, writing a bunch of windows is a burst. Second, it wasn't clear from the original posting that the SPARC was actually running slower than some other machine - the poster just said it was slow. Maybe he had a big computation? Did he try the -O switch? Third, programs that recurse can often be written as iterations. All machines, SPARC included, run the iterative form faster. I'm unconvinced that register windows are a good idea, but I'm also unconvinced that SPARC is some sort of disaster area. -- Don D.C.Lindsay
blc@mentat.COM (Bruce Carneal) (11/14/90)
In article <1990Nov11.102907.7706@cs.cmu.edu> spot@WOOZLE.GRAPHICS.CS.CMU.EDU (Scott Draves) writes: >It is my understanding that the SunOS 4.x compiler does >detect and eliminate tail recursion. This is verified by >compiling (-O4 -S) > (example of intra procedural tail recursion deleted) Unfortunately this doesn't work in the general case. Compiling under SunOS 4.1 with 'cc -O4 -S ...' the trivial program fragment: func1(arg) int arg; { return func2(arg); } func2(arg) int arg; { return func3(arg); } func3(arg) int arg; { return arg; } you get (with some junk removed): _func1: save %sp,-96,%sp call _func2,1 mov %i0,%o0 ret restore %g0,%o0,%o0 _func2: save %sp,-96,%sp call _func3,1 mov %i0,%o0 ret restore %g0,%o0,%o0 _func3: retl nop As shown, no tail call optimization occurs. This form of call is used along the main paths of some Streams modules (and elsewhere of course). Later compilers can/do optimize those calls.
jcallen@Encore.COM (Jerry Callen) (11/14/90)
In article <11092@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes: >First, compilers for non-window machines try to dribble values out to >memory, avoiding bursts that could cause processor stalls. (The >infamous VAX "calls" instruction writes a burst, and that seriously >stalled the VAX 11/780.) Obviously, writing a bunch of windows is a >burst. I'm curious what's wrong with a burst write of registers out to memory? It seem to me that an intelligent memory system (maybe with a hint from the processor that a burst is coming) could accept one word per cycle and get those registers out to memory very fast. If the processor is "stalled" while the registers go out, how is this different from procedure prologue code in a RISC that does a bunch of stores in a row? In fact, the RISC has to fetch the instructions, so the RISC should take MORE time to save the registers. I'l take a stab at answering my own question (I like talking to myself :-)). Conceivably the RISC could NOT do the saves as a block in the prologue, but (as Donald says) dribble them out as the register is needed. One complication here is that the epilogue has to have some way of knowing which registers actually have to be restored (in the case of multiple returns, or, for Ada, exceptions). -- Jerry Callen jcallen@encore.com
henry@zoo.toronto.edu (Henry Spencer) (11/15/90)
In article <11092@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes: >First, compilers for non-window machines try to dribble values out to >memory, avoiding bursts that could cause processor stalls. (The >infamous VAX "calls" instruction writes a burst, and that seriously >stalled the VAX 11/780.) Obviously, writing a bunch of windows is a >burst. The tradeoffs here are subtle. Many memory systems have much higher bandwidth for sequential bursts than for random accesses, although your bus and processor have to be designed to exploit this. There is also the question of stalls waiting for data to be *read* from memory, while the window machine can charge ahead because the data is probably in a register instead. >I'm unconvinced that register windows are a good idea... The MIPSCo people, who measure almost everything, say that for ordinary C code it doesn't matter much either way, *if* you're willing to assume sophisticated compilers. (Of course, they are perhaps not entirely unbiased... :-)) They also say that for languages with highly dynamic calling patterns (e.g. C++ and Smalltalk), where the compiler has less of a handle on what's happening, register windows would be a modest win. They're definitely a big win if your compiler isn't too bright. >but I'm also >unconvinced that SPARC is some sort of disaster area. I wouldn't call it a disaster area exactly, at least not for this reason :-), but rapid large excursions in stack depth are definitely expensive on it. -- "I don't *want* to be normal!" | Henry Spencer at U of Toronto Zoology "Not to worry." | henry@zoo.toronto.edu utzoo!henry
brandis@inf.ethz.ch (Marc Brandis) (11/15/90)
In article <1990Nov12.054723.18015@actrix.co.nz> Bruce.Hoult@actrix.co.nz (Bruce Hoult) writes: >>More to the point, *any* processor with register windows is not going >>to do too well on deeply recursive code. > >I don't know much about this, but it seems to me that a machine without >register windows is oiggoing to have to put those values on the stack >in any case. A machine with lots of registers that have to be stored When you look at typical recursive procedures, you will notice that they have very often only a small number of local variables, that can be held in registers. In an architecture with register windows, a trap occurs on stack underflow or overflow and causes the register windows to be dumped to memory. Most of the time, too much is stored and restored. In an architecture without register windows, a smart compiler can remove much of this overhead. There are also quite a lot of cases, where recursive procedures use variables in one path through the procedure but not in another. In this case it may be better to have the variables in memory anyway, so nothing would have to be stored at all. (* I speak only for myself. Marc-Michael Brandis, Institut fuer Computersysteme, ETH-Zentrum CH-8092 Zurich email: brandis@inf.ethz.ch brandis@iis.ethz.ch *)
berg@cip-s04.informatik.rwth-aachen.de (AKA Solitair) (11/16/90)
Norwitz Neal writes: >I have a recursive algorithm that I would like to run as fast as possible. >I am running a part of it on a SPARCstation SLC. But at this rate, it might >finnish in a couple of years!! Why don't you have a look at the FORTH-chip (don't remember the manufacturer, the part number should be something like 4016 if I recall correctly), the processor has no general purpose registers, just a stack cache, the thing was designed to execute *native* FORTH (a medium to high level threaded stack based language) code. Last I heard about it, they we're designing a new one. This processor was practically designed for recursive algorithms. -- Sincerely, berg%cip-s01.informatik.rwth-aachen.de@unido.bitnet Stephen R. van den Berg. "I code it in 5 min, optimize it in 90 min, because it's so well optimized: it runs in only 5 min. Actually, most of the time I optimize programs."
shri@ncst.ernet.in (H.Shrikumar) (11/19/90)
In article <3700@rwthinf.UUCP> >Norwitz Neal writes: >>I have a recursive algorithm that I would like to run as fast as possible. >>I am running a part of it on a SPARCstation SLC. But at this rate, it might >>finnish in a couple of years!! and berg%cip-s01.informatik.rwth-aachen.de@unido.bitnet writes: >Why don't you have a look at the FORTH-chip (don't remember the manufacturer, >the part number should be something like 4016 if I recall correctly), the >processor has no general purpose registers, just a stack cache, the thing The start-up that thought up the 4016 had start-up problems, and sold the technology to Harris. Harris cleaned up a lot of the hardware bugs in the Novix 4016, christened it the RTX 2000 (*R*eal *T*ime e*X*press), played with it for a little while, and decided to discontinue the line. Sad. See other posts in this newgroup and comp.lan.forth for more details. But your point on Stack Caching machines and recursive routines is well taken. Just to comment, I think the original poster was talking about mature machines, and the Novix/RTX was yet no more than a chip and an architecture concept. (25ns RAM, three bank of it, no OS yet ...) -- shrikumar ( shri@ncst.in )
martin@ee.uni-sb.de (Martin Huber) (12/13/90)
I forward this for friend. *Don't* reply to me, reply to the address in the Reply-To header. Here goes: In article <1093@shakti.ncst.ernet.in> you write: >In article <3700@rwthinf.UUCP> > >>Norwitz Neal writes: >>>I have a recursive algorithm that I would like to run as fast as possible. >>>I am running a part of it on a SPARCstation SLC. But at this rate, it might >>>finnish in a couple of years!! > >and berg%cip-s01.informatik.rwth-aachen.de@unido.bitnet writes: > >>Why don't you have a look at the FORTH-chip (don't remember the manufacturer, >>the part number should be something like 4016 if I recall correctly), the >>processor has no general purpose registers, just a stack cache, the thing > > The start-up that thought up the 4016 had start-up problems, and >sold the technology to Harris. Harris cleaned up a lot of the hardware >bugs in the Novix 4016, christened it the RTX 2000 (*R*eal *T*ime e*X*press), >played with it for a little while, and decided to discontinue the line. > > Sad. See other posts in this newgroup and comp.lan.forth for more >details. > > But your point on Stack Caching machines and recursive routines is >well taken. > > Just to comment, I think the original poster was talking about >mature machines, and the Novix/RTX was yet no more than a chip and >an architecture concept. (25ns RAM, three bank of it, no OS yet ...) > >-- shrikumar ( shri@ncst.in ) Another comment: I have been at the November '90 trade show ELECTRONICA in Munich, Germany. I have visited HARRIS. They were perfectly willing to sell the RTX2000. I got a databook, infos ... There was a Mr. Klaus Flesch which is obviously something like general manager of the company FORTH SYSTEMS, Germany. He *sells* RTX2000-based products. IMHO, the RTX2000 is available. It *is* mature: 10MFIPS (FORTH micro-instructions per second) is definitively state of the art in RISCS. The hardware is capable of being micro-codable. I think this is better than many of today's machines. Prices should be in the $500-$5000 range (please ex- cuse my bad main memory, it is *not* a standard memory), in any case, they *are* cheaper than a big useless number cruncher. Take it or leave it, it's up to you! Martin (mahu@ee.uni-sb.de) Disclaimer: Mostly i wait for technology to improve, but sometimes improvements go by unnoticed. Moral: Watch out! -- Sincerely, berg%cip-s01.informatik.rwth-aachen.de@unido.bitnet Stephen R. van den Berg. "I code it in 5 min, optimize it in 90 min, because it's so well optimized: it runs in only 5 min. Actually, most of the time I optimize programs."