laba-3en@web-1c.berkeley.edu (Raja S Kushalnagar) (02/02/90)
While writing and analyzing the RISC II fpu last summer, under my prof, there were several interesting things to be noted - of the RISC vs CISC. Pipelining is inherently much easier on the RISC, and can be carried on to many stages easily owing to the reduced instruction set and the fixed length of the instructions, as opposed to the CISC's. Also, it is much easier to implement window register meshing on RISC's. Those two factors alone, atleast in the original Berkeley RISC II, were responsible for about 40% of the speedup. Add much more efficient compiling, as was done with the Stanford MIPS design, and that would account for most of the speedup in today's Sparcs and Mips, along with the r.i set. The Motorola 68040, if what I've been reading about here is to be believed, has almost been able to surmount the difficulties of deep pipelining, though it seems not to have surmounted the issue of efficient window register meshing, and probably might not be able to get over it at all, unless it goes more the RISC way, which it already seems to be going towards, but while retaining all the advantages of CISCs. If that can be done, it should really perform well. But if it can't really address the issue of utilizing overlapping registers and frames, I wonder if it could improve quickly enough to match the RISCs. Then perhaps there might not be a 68050. :( The present 1.2 million transistors on chip is remarkable though - it could improve that way? The same source code when compiled under a DEC 3100 (RISC technology) produced binaies that were usually about 30-40% larger than those compiled under a Sun 4/260 (CISC technology) - but with the prices of RAM being what they are these days ... the speed increase clearly matters more, as of now. For example, when I compiled gnuchess on the Sun 4/260, it was about 30% smaller than the binary produced under the DEC 3100. Interestingly despite the more advanced compilers present with the DEC, it took slightly more time to compile. But to run it was another story. The DEC was noticeably faster. I did not (g)profile it though, and can't be precise. RISC seems to be somewhat superior, given the current technological and economic constraints. It seems a bit shaky though. Raja. P.S. Will the 68040 be pin compatiable with its predecessors? I thought it ought to be, but some articles have said it wasn't? Is there any manual or anything technical available at present for the 68040? Apart from swiping technical secrets from Motorola, that is. :) raja@{soda,ocf,athena}.berkeley.edu | root@athena.berkeley.edu | laba-3en@web Programmer Analyst I, S & P Dept, UC Berkeley. _____________________________________________________________________________ Seeing the basketball team - the New York Knicks, on TV brings a smile to my lips, for in Brit English, "knicks" are equivalent to "panties". :) -----------------------------------------------------------------------------
piaw@cory.Berkeley.EDU (Na Choon Piaw) (02/03/90)
In article <1990Feb2.073513.29698@agate.berkeley.edu> laba-3en@web-1c (Raja S Kushalnagar) writes: >The Motorola 68040, if what I've been reading about here is to be believed, >has almost been able to surmount the difficulties of deep pipelining, though >it seems not to have surmounted the issue of efficient window register >meshing, and probably might not be able to get over it at all, unless it >goes more the RISC way, which it already seems to be going towards, but while >retaining all the advantages of CISCs. If that can be done, it should really I beg to differ. During a study of MIPS vs. RISC by DEC's WRL, it was found that a better compiler will *always* do better for code by using a large register set instead of wasting all that register space using a register window system. In fact, the prime reason why Berkeley did the RISC with register windows was that the team producing the RISC and RISC II did not include compiler experts. So, the 680x0 series could improve by adding more registers, and improving compiler technology (especially register allocation) to take advantage of those registers (as was done with the MIPS). The problem with a large register file is that when a context switch occurs, the enter file has to be dumped. Also, the nesting level of functions, should they exceed the (typically small) register window nesting level, will cause an overflow and require a complete dump of all the register windows. According to the WRL report (which I can't quote from since I don't have it with me, see the technical report: Register allocation versus Register windows by Digital Equipment Corp. Western Research Labs), this causes such a performance penalty that MIPS, with its better compilers, came out significantly faster. In fact, with the advent of portable compilers, I'd warrant that it'll be cheaper in the long run. (Just do the compiler once) >RISC seems to be somewhat superior, given the current technological and >economic constraints. It seems a bit shaky though. RISC is not shaky. Given the current technology, it is probably the only architecture capable of being implemented on the latest (highest speed) technologies like GaAr. > Raja. lachesis@nyquist.bellcore.com "Atari might not have done better [than C=] had they bought Amiga, but they couldn't have done much worse, either."
daveh@cbmvax.commodore.com (Dave Haynie) (02/03/90)
In article <1990Feb2.073513.29698@agate.berkeley.edu> laba-3en@web-1c (Raja S Kushalnagar) writes: >...has almost been able to surmount the difficulties of deep pipelining, though >it seems not to have surmounted the issue of efficient window register >meshing, and probably might not be able to get over it at all, unless it >goes more the RISC way, which it already seems to be going towards, but while >retaining all the advantages of CISCs. If that can be done, it should really >perform well. But if it can't really address the issue of utilizing >overlapping registers and frames, I wonder if it could improve quickly enough >to match the RISCs. Only a few of the RISCs out are massive register machines with register windows, etc. For example, the Motorola 88k only have 32 general purpose registers, and it's currenly winning the SPECmark wars. Most RISC chips have at least 32 general registers, but its only the Berleley influenced ones that have register windows (the two I know of are Sparc and the AMD 29K). There were independent RISC projects conducted elsewhere, such as at Stanford, that came up with different RISC ideas. And of course, you can always go back and see what they did in Supercomputers 5 or 10 years ago -- most early supercomputers were necessarily RISCy. >The same source code when compiled under a DEC 3100 (RISC technology) produced >binaies that were usually about 30-40% larger than those compiled under a Sun > 4/260 (CISC technology) The Sun 3 series is 68030; all the Sun 4s are Sparcs. The difference you see could be the difference between Sparc and the MIPS CPUs DEC uses. It could also be operating system stuff -- shared libraries drastically cut the size of your binaries, other things do as well. >Interestingly despite the more advanced compilers present with the DEC, it took >slightly more time to compile. That's exactly what you'd expect, all things being equal. More optimizations and other cleverness will always take longer. >RISC seems to be somewhat superior, given the current technological and >economic constraints. It seems a bit shaky though. The thing everyone needs to realize is that there is no one "RISC". RISC is more like an architectural bag of tricks designed to make faster CPUs. Some of these tricks can be applied to older CPU designs, others probably can't. The one who wins is the one who goes the fastest, regardless of what it took to get there. The winner these days changes on a pretty regular basis. May you live in interesting times. > Raja. >P.S. Will the 68040 be pin compatiable with its predecessors? I thought >it ought to be, but some articles have said it wasn't? No, the refined the bus arctitecture and various other bits to optimize the '040's path to external memory in the fastest possible case. That just makes it more difficult on designers, (another CPU bus to learn, work to adapt it to older systems, etc) but as long as you get something in return, it's no big deal. -- Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests" {uunet|pyramid|rutgers}!cbmvax!daveh PLINK: hazy BIX: hazy Too much of everything is just enough
navas@cory.Berkeley.EDU (David C. Navas) (02/03/90)
In article <21699@pasteur.Berkeley.EDU> lachesis@nyquist.bellcore.com writes: >In article <1990Feb2.073513.29698@agate.berkeley.edu> laba-3en@web-1c (Raja S Kushalnagar) writes: >>The Motorola 68040, if what I've been reading about here is to be believed, >>has almost been able to surmount the difficulties of deep pipelining, though >>it seems not to have surmounted the issue of efficient window register >>meshing, and probably might not be able to get over it at all, unless it >>goes more the RISC way, which it already seems to be going towards, but while >>retaining all the advantages of CISCs. If that can be done, it should really > >I beg to differ. During a study of MIPS vs. RISC by DEC's WRL, it was found >that a better compiler will *always* do better for code by using a large >register set instead of wasting all that register space using a register >window system. Yep, I've heard similarly. In reality, I'm sure there are just as many people that think that register windowing is a good thing, as there are people that think the entire idea is a bit flakey. [For interested parties, I belong to the latter.] To answer a question of Raja's... The 68040 is *not* pin compatible for several reasons, most of which have to do with the fact that they no longer support external FPU/MMUs... [They don't have to...] Byte has a very interesting, but brief, article about the 68040. The magazine also has some very nice things to say about the Amiga in it... AMAZING!!! Well worth the read, I think... David Navas navas@cory.berkeley.edu
laba-3en@web-1d.berkeley.edu (Raja S Kushalnagar) (02/04/90)
In article <21699@pasteur.Berkeley.EDU> lachesis@nyquist.bellcore.com writes: >>In article <1990Feb2.073513.29698@agate.berkeley.edu> laba-3en@web-1c (Raja S Kushalnagar) writes: >>it seems not to have surmounted the issue of efficient window register >>meshing, and probably might not be able to get over it at all, unless it >>goes more the RISC way, which it already seems to be going towards, but while >>retaining all the advantages of CISCs. If that can be done, it should really > >I beg to differ. During a study of MIPS vs. RISC by DEC's WRL, it was found >that a better compiler will *always* do better for code by using a large >register set instead of wasting all that register space using a register >window system. In fact, the prime reason why Berkeley did the RISC with >register windows was that the team producing the RISC and RISC II did not >include compiler experts. So, the 680x0 series could improve by adding more >registers, and improving compiler technology (especially register >allocation) to take advantage of those registers (as was done with the MIPS). Yes, it was true that the Berkeley team didn't have any compiler experts, but, the emphasis on increased compiler efficiency is a more of a software consideration than hardware, and shifts the burden excessively onto the compiler experts. And the RISC II did have 32 general registers. True, window mechanism might not allow as much optimization as say, on the MIPS, but it is an hardware solution, and thus amenable to dramtic performance improvements as hardware technology leapfrogs ahead. Hardware technology, by neccessity, is usually a couple of years ahead of software technology. A few years back, people got interested in a software solution for improving the CISC's, - Writable Control Store (WCS). This was devised to overcome the problem of computers running three or four microcycles per instruction - and make it possible for most instructions to be run in a single microcycle. It became popular, but eventually fell out of favour due to virtual memory and multiprocess complications. It didn't last more than a couple of years, I think. I am just drawing a comparision - I don't know if it's really justified, but it does seem that way. >The problem with a large register file is that when a context switch occurs, >the enter file has to be dumped. Also, the nesting level of functions, >should they exceed the (typically small) register window nesting level, will >cause an overflow and require a complete dump of all the register windows. >According to the WRL report (which I can't quote from since I don't have it >with me, see the technical report: Register allocation versus Register >windows by Digital Equipment Corp. Western Research Labs), this causes such >a performance penalty that MIPS, with its better compilers, came out >significantly faster. Most procedures generally have less than five or six variables, and almost never more than 10 - 12, and most of them are heavily used - a half to two thirds of the dynamically executed referenced to operands. Procedure calls are slowed down when there are a great many registers. So, having multiple register windows (or sets) means that registers would not have to be saved and restored on every procedure call. This works simply because of the generally ordered nature of programs - i.e nesting and loops. Programs rarely execute a long uninterrupted sequence of calls followed by an equally long uninterrupted sequence of returns. Now, that by itself, would probably be too slow to be justified, so the overlapping mechanism is used: Rather than copy the parameters from one wqindow to another on each call, windows are overlapped so that some registers are simultaneously part of two windows. So by putting parameters into the overlapping registers, operands are passed automatically. A disadvantage though, is that register windows use more chip area and of course the context switches. But the context switches occur rather rarely - about 100 to 1000 procedure calls for every switch. They require that two or three times as many registers be saved on the average, than chips without the register window mechanism. The penalty is not all that great, considering the speed up. You could very easily make a certain program run much faster on the MIPS than on the RISC and another program vice-versa. >In fact, with the advent of portable compilers, I'd warrant that it'll be >cheaper in the long run. (Just do the compiler once) I beg to differ. You just can't have a generic portable compiler and hope to make it exploit the architecture it is on, without any major tinkering. Also, optimizing compilers have a drawback in that they are much slower than conventional compilers (half the speed, even) and that they ignore the register saving penalty of procedure calls. >>RISC seems to be somewhat superior, given the current technological and >>economic constraints. It seems a bit shaky though. > >RISC is not shaky. Given the current technology, it is probably the only >architecture capable of being implemented on the latest (highest speed) >technologies like GaAr. You are probably right. It has taken an awfully long time for GaAr to become viable, though. I remember reading about them as far back as '81, though I doubt I understood what it really meant. :) >lachesis@nyquist.bellcore.com Raja. How can the very concept of junk bonds exist? The US has paid a big penalty for that. If a Milliken can get a 550 million salary, that is almost 0.1% of the entire US population income, I can't but help wonder. ############################################################################# raja@{soda,ocf,athena}.berkeley.edu | root@athena.berkeley.edu | laba-3en@web -----------------------------------------------------------------------------
poirier@dg-rtp.dg.com (Charles Poirier) (02/07/90)
In article <1990Feb3.211701.15279@agate.berkeley.edu> laba-3en@web-1d (Raja S Kushalnagar) writes: <In article <21699@pasteur.Berkeley.EDU> lachesis@nyquist.bellcore.com writes: <> <>RISC is not shaky. Given the current technology, it is probably the only <>architecture capable of being implemented on the latest (highest speed) <>technologies like GaAr. < <You are probably right. It has taken an awfully long time for GaAr to become <viable, though. Yeah, they've been working long and hard on that Gallium Argonide process. Fast as heck, but the chips always decompose within 10 microseconds. :-) (Hint: Arsenic is As, not Ar.) Cheers, Charles Poirier