ram@wb1.cs.cmu.edu (Rob MacLachlan) (06/11/89)
I am a Lisp systems programmer for the CMU Common Lisp project. I am currently working on a new Common Lisp compiler. I have several comments to make about evaluating a system for Lisp, but I generally won't come down on one side or the other of the SPARC/MIPS question, since I really don't know the important system details. The first thing I will say, and I believe the most important is *consider the non-processor issues*. I say this up front, since I suspect this sort of thing will get short shrift in the following flamage. The biggest determinant of performance of many Lisp systems is memory system performance: -- How much memory can you put in? If not 16 meg+, forget it for serious work. -- How much memory can you afford to put in? -- How fast are the disks? Even with lots of memory you will page. -- How big is the cache? As to the specific RISC v.s. CISC question for Lisp, I strongly favor a RISC approach. In addition to the usual RISC arguments, there are several Lisp specific factor that favor RISC: -- Complex addressing modes are difficult or impossible to use due to the way Lisp data structures are laid out (with indirections everywhere.) -- Complex call instructions not designed for Lisp are rarely useful for Lisp. Generally Lisp call sequences for CISCs ignore hairy call features and use multiple simple instructions. The general idea is that the complex features of widely available CISC architectures aren't designed to support Lisp, and therefore are rarely useful for Lisp. You end up using the subset of the instruction set that resembles a RISC. As to SPARC v.s. MIPS: Pro SPARC: -- Has special tagged +/- instructions for short integer arithmetic. This can speed up these operations in the absence of appropriate FIXNUM declarations. Integer arithmetic is important in many (but not all) symbolic applications for vector indices, loop counter counters, etc. Even if your program doesn't uses integers much, the underlying Lisp run-time system uses them for things such as I/O, hashtables, etc. Of course, the system code *should* have the right declarations, so tagged operations shouldn't have a big affect on system performance. -- Has register windows, which *potentially* are a win for Lisp call/return, since the main competing technology (global register allocation) works poorly with Lisp's run-time function linkage. In practice, I don't know how well SPARC's windows mesh with Lisp call-sequence requirements. Pro MIPS: -- Has lots of registers, so: - It is easier to sucessfully allocate locals in registers. - Important globals of the run-time system can be kept in registers. A Lisp system may have three stack pointers, several heap pointers, a couple registers for frame pointers, etc. I am primarily a compiler writer, so perhaps it is not surprising that I somewhat favor MIPS: it gives me more room (registers) to play with, without preempting any design decisions. Also, I am interested in global register allocation algorithms and "block compilation" of programs (resolving function references at compile time). On the other hand, if I were desiging an architecture for Lisp, I would certainly make checked fixnum arithmetic "free", and I would also seriously consider using register windows. Note that I am not convinced register windows are necassarily a good thing for Lisp. Studies of non-Lisp program performance are not very relevant, since Lisp functions are clearly statistically different in some ways: -- Functions tend to be smaller. -- Recursion is more common. I suspect that small or variable-sized windows would be a win for Lisp. Rob --
dpm@cs.cmu.edu (David Maynard) (06/12/89)
ram@wb1.cs.cmu.edu (Rob MacLachlan) writes: > -- How fast are the disks? Even with lots of memory you will page. I don't have direct experience, but recent Sun-Spots (comp.sys.sun) articles have indicated that the SparcStation is significantly better at disk I/O than the DECStation 3100. (Gee, could that DMA be paying off?) However, these same articles indicate that the 3100 can be significantly better for number crunching. I suspect that part of the difference here is the "quality" of the math libraries. It seems to take more work to make a Sun run math codes faster. I'm sure the 3100 does have a raw speed advantage, I'm just not sure it is as great as some people have reported. I believe that more AI software is currently available for the Suns. Sun has had longer to build an AI base. It is likely that DEC will try to close this gap though. --- David P. Maynard (dpm@cs.cmu.edu) Dept. of Electrical and Computer Engineering, Carnegie Mellon University --- These are my opinions. I haven't asked CMU what our official opinion is.
khb@chiba.Sun.COM (chiba) (06/13/89)
In article <EYYf=ky00jcqNYcW4L@cs.cmu.edu> dpm@cs.cmu.edu (David Maynard) writes: >ram@wb1.cs.cmu.edu (Rob MacLachlan) writes: >> -- How fast are the disks? Even with lots of memory you will page. > >I don't have direct experience, but recent Sun-Spots (comp.sys.sun) >articles have indicated that the SparcStation is significantly better >at disk I/O than the DECStation 3100. (Gee, could that DMA be paying >off?) > >However, these same articles indicate that the 3100 can be >significantly better for number crunching. I suspect that part of the >difference here is the "quality" of the math libraries. It seems to >take more work to make a Sun run math codes faster. I'm sure the 3100 >does have a raw speed advantage, I'm just not sure it is as great as >some people have reported. Math performance is partially due to design choices in the libraries. One of the key questions (for C) is which standard ... ANSI, SVID, K&R, Posix ... all require slightly different answers to certain cases. Sun's arithmetic group is (perhaps) too concerned with getting the correct answer ... seymour has proved that folks prefer fast to accurate :> > >I believe that more AI software is currently available for the Suns. >Sun has had longer to build an AI base. It is likely that DEC will >try to close this gap though. When there is a DEC sponsored LISP, we can try to determine real performance figures. I am not very AI oriented, but in my limited AI experience IO dominates in most "real" systems (as pointed out above) so I expect the SS330 to perform better on some reasonable set of application sized codes. After IO, the next performance "feature" of AI codes is probably memory subsystem speed. The current MIPS vs. SPARC implementation key difference is cycles for ld/sto ... MIPS is faster; but this results in many stalls on the 3100 (vs. say, the MIPS M2000 with its 4-deep buffered write thru cache) ... the SS330 has enough buffering that stores tend not to lock up ... as best I can remember, the DS3100 has write thru, no buffering ... so loads should be faster, stores slower. After those, the issues of tagged arithmetic, and register windows vs. shared pool probably kick in. My experience is that IO vastly dominates these second order effects, on "real AI applications". > --- > These are my opinions. I haven't asked CMU what our official opinion is. ditto. I haven't even asked sun what our opinon is. Keith H. Bierman |*My thoughts are my own. Only my work belongs to Sun* It's Not My Fault | Marketing Technical Specialist ! kbierman@sun.com I Voted for Bill & | Languages and Performance Tools. Opus (* strange as it may seem, I do more engineering now *)
mash@mips.COM (John Mashey) (06/13/89)
In article <109577@sun.Eng.Sun.COM> khb@sun.UUCP (chiba) writes: >After IO, the next performance "feature" of AI codes is probably >memory subsystem speed. The current MIPS vs. SPARC implementation key >difference is cycles for ld/sto ... MIPS is faster; but this results >in many stalls on the 3100 (vs. say, the MIPS M2000 with its 4-deep >buffered write thru cache) ... the SS330 has enough buffering that >stores tend not to lock up ... as best I can remember, the DS3100 has >write thru, no buffering ... so loads should be faster, stores slower. Most R2000 or R3000 systems, DS3100 included, use 4-deep write-buffers, often built with R2020 Write Buffers. Unless I misread the Sun info (which I don't have handy), both SS1 and SS330 use write-thru caches with a 1-deep write buffer: please correct if this is wrong. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
khb@chiba.Sun.COM (Keith Bierman - SPD Languages Marketing -- MTS) (06/14/89)
In article <21525@winchester.mips.COM> mash@mips.COM (John Mashey) writes: > >Most R2000 or R3000 systems, DS3100 included, use 4-deep write-buffers, >often built with R2020 Write Buffers. I stand corrected. Now all I have to do is understand why our 3100 performance does not scale trivially to our M2000 .... suggestions ? Memory system had seemed so attractive as an explaination .... >Unless I misread the Sun info (which I don't have handy), both SS1 and >SS330 use write-thru caches with a 1-deep write buffer: please correct >if this is wrong. Reverse engineering (i.e. looking at the performance on the same codes) one can see the SS1 stalling much more often than stingray. Clearly my memory is faulty (or I would not have misquoted the MIPS/DEC lit :>).. but my recollection is that 4/330 has a double word of buffering. Sad to say, I must confess to not having the lit handy. I will check and if no one beats me to the correction, I will post one (if needed). :> cheers Keith H. Bierman |*My thoughts are my own. Only my work belongs to Sun* It's Not My Fault | Marketing Technical Specialist ! kbierman@sun.com I Voted for Bill & | Languages and Performance Tools. Opus (* strange as it may seem, I do more engineering now *)
mash@mips.COM (John Mashey) (06/14/89)
In article <109849@sun.Eng.Sun.COM> khb@sun.UUCP (Keith Bierman - SPD Languages Marketing -- MTS) writes: >In article <21525@winchester.mips.COM> mash@mips.COM (John Mashey) writes: >>Most R2000 or R3000 systems, DS3100 included, use 4-deep write-buffers, >>often built with R2020 Write Buffers. >I stand corrected. Now all I have to do is understand why our 3100 >performance does not scale trivially to our M2000 .... suggestions ? >Memory system had seemed so attractive as an explaination .... The memory systems are very different. R2000-based machines use the simplest-possible cache, i.e., write-thru with 1-word refills on miss. and cache-word invalidates on partial-word writes. R3000s are much more complicated, as they can do anywhere from 1 to 32 words/block burst refill, instruction streaming, direct drive of the cache rams, optional read-modify-write instead of invalidate for partial-word writes; they typically have memory systems with more interleaving, page-mode DRAMs, etc, etc. >Reverse engineering (i.e. looking at the performance on the same >codes) one can see the SS1 stalling much more often than stingray. >Clearly my memory is faulty (or I would not have misquoted the >MIPS/DEC lit :>).. but my recollection is that 4/330 has a double word >of buffering. Oops, I recall seeing that also, but I think I was thinking it was 1 doubleword, rather than 2 32-bit words. Now, I don't know which. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
khb@chiba.Sun.COM (chiba) (06/14/89)
In article <21641@winchester.mips.COM> mash@mips.COM (John Mashey) writes: >>MIPS/DEC lit :>).. but my recollection is that 4/330 has a double word >>of buffering. >Oops, I recall seeing that also, but I think I was thinking it was >1 doubleword, rather than 2 32-bit words. Now, I don't know which. >-- Works either way. The SPARC store-double instruction "fits", as do two single stores. Keith H. Bierman |*My thoughts are my own. Only my work belongs to Sun* It's Not My Fault | Marketing Technical Specialist ! kbierman@sun.com I Voted for Bill & | Languages and Performance Tools. Opus (* strange as it may seem, I do more engineering now *)