[comp.sys.amiga.hardware] 68040 vs RISC

laba-3en@web-1c.berkeley.edu (Raja S Kushalnagar) (02/02/90)

While writing and analyzing the RISC II fpu last summer, under my prof,
there were several interesting things to be noted - of the RISC vs CISC. 

Pipelining is inherently much easier on the RISC, and can be carried
on to many stages easily owing to the reduced instruction set and the fixed
length of the instructions, as opposed to the CISC's.  Also, it is much 
easier to implement window register meshing on RISC's.  Those two factors 
alone, atleast in the original Berkeley RISC II, were responsible for about
40% of the speedup. Add much more efficient compiling, as was done with the
Stanford MIPS design, and that would account for most of the speedup in 
today's Sparcs and Mips, along with the r.i set.  

The Motorola 68040, if what I've been reading about here is to be believed,
has almost been able to surmount the difficulties of deep pipelining, though
it seems not to have surmounted the issue of efficient window register 
meshing, and probably might not be able to get over it at all, unless it 
goes more the RISC way, which it already seems to be going towards, but while
retaining all the advantages of CISCs.  If that can be done, it should really
perform well. But if it can't really address the issue of utilizing 
overlapping registers and frames, I wonder if it could improve quickly enough
to match the RISCs.  Then perhaps there might not be a 68050. :(  The present
1.2 million transistors on chip is remarkable though - it could improve that 
way? 

The same source code when compiled under a DEC 3100 (RISC technology) produced
binaies that were usually about 30-40% larger than those compiled under a Sun
 4/260 (CISC technology) - but with the prices of RAM being what they are 
these days ... the speed increase clearly matters more, as of now.  For 
example, when I compiled gnuchess on the Sun 4/260, it was about 30% smaller
than the binary produced under the DEC 3100.   Interestingly 
despite the more advanced compilers present with the DEC, it took slightly more
time to compile.  But to run it was another story.  The DEC was noticeably
faster.  I did not (g)profile it though, and can't be precise. 

RISC seems to be somewhat superior, given the current technological and 
economic constraints.  It seems a bit shaky though. 

	Raja.

P.S.  Will the 68040 be pin compatiable with its predecessors?  I thought 
it ought to be, but some articles have said it wasn't?  Is there any 
manual or anything technical available at present for the 68040? 
Apart from swiping technical secrets from Motorola, that is. :)

raja@{soda,ocf,athena}.berkeley.edu | root@athena.berkeley.edu | laba-3en@web
	Programmer Analyst I, S & P Dept, UC Berkeley. 
_____________________________________________________________________________
Seeing the basketball team - the New York Knicks, on TV brings a smile to my 
lips, for in Brit English, "knicks" are equivalent to "panties". :)
-----------------------------------------------------------------------------

piaw@cory.Berkeley.EDU (Na Choon Piaw) (02/03/90)

In article <1990Feb2.073513.29698@agate.berkeley.edu> laba-3en@web-1c (Raja S Kushalnagar) writes:
>The Motorola 68040, if what I've been reading about here is to be believed,
>has almost been able to surmount the difficulties of deep pipelining, though
>it seems not to have surmounted the issue of efficient window register 
>meshing, and probably might not be able to get over it at all, unless it 
>goes more the RISC way, which it already seems to be going towards, but while
>retaining all the advantages of CISCs.  If that can be done, it should really

I beg to differ.  During a study of MIPS vs. RISC by DEC's WRL, it was found
that a better compiler will *always* do better for code by using a large
register set instead of wasting all that register space using a register
window system.  In fact, the prime reason why Berkeley did the RISC with
register windows was that the team producing the RISC and RISC II did not
include compiler experts.  So, the 680x0 series could improve by adding more
registers, and improving compiler technology (especially register
allocation) to take advantage of those registers (as was done with the MIPS).

The problem with a large register file is that when a context switch occurs,
the enter file has to be dumped.  Also, the nesting level of functions,
should they exceed the (typically small) register window nesting level, will
cause an overflow and require a complete dump of all the register windows.
According to the WRL report (which I can't quote from since I don't have it
with me, see the technical report: Register allocation versus Register
windows by Digital Equipment Corp. Western Research Labs), this causes such
a performance penalty that MIPS, with its better compilers, came out
significantly faster.

In fact, with the advent of portable compilers, I'd warrant that it'll be
cheaper in the long run.  (Just do the compiler once)

>RISC seems to be somewhat superior, given the current technological and 
>economic constraints.  It seems a bit shaky though. 

RISC is not shaky.  Given the current technology, it is probably the only
architecture capable of being implemented on the latest (highest speed)
technologies like GaAr.

>	Raja.

lachesis@nyquist.bellcore.com

"Atari might not have done better [than C=] had they bought Amiga, but they
couldn't have done much worse, either."

daveh@cbmvax.commodore.com (Dave Haynie) (02/03/90)

In article <1990Feb2.073513.29698@agate.berkeley.edu> laba-3en@web-1c (Raja S Kushalnagar) writes:

>...has almost been able to surmount the difficulties of deep pipelining, though
>it seems not to have surmounted the issue of efficient window register 
>meshing, and probably might not be able to get over it at all, unless it 
>goes more the RISC way, which it already seems to be going towards, but while
>retaining all the advantages of CISCs.  If that can be done, it should really
>perform well. But if it can't really address the issue of utilizing 
>overlapping registers and frames, I wonder if it could improve quickly enough
>to match the RISCs.  

Only a few of the RISCs out are massive register machines with register windows,
etc.  For example, the Motorola 88k only have 32 general purpose registers, and
it's currenly winning the SPECmark wars.  Most RISC chips have at least 32
general registers, but its only the Berleley influenced ones that have register
windows (the two I know of are Sparc and the AMD 29K).  There were independent
RISC projects conducted elsewhere, such as at Stanford, that came up with 
different RISC ideas.  And of course, you can always go back and see what they
did in Supercomputers 5 or 10 years ago -- most early supercomputers were 
necessarily RISCy.

>The same source code when compiled under a DEC 3100 (RISC technology) produced
>binaies that were usually about 30-40% larger than those compiled under a Sun
> 4/260 (CISC technology) 

The Sun 3 series is 68030; all the Sun 4s are Sparcs.  The difference you see
could be the difference between Sparc and the MIPS CPUs DEC uses.  It could also
be operating system stuff -- shared libraries drastically cut the size of
your binaries, other things do as well.

>Interestingly despite the more advanced compilers present with the DEC, it took 
>slightly more time to compile.  

That's exactly what you'd expect, all things being equal.  More optimizations
and other cleverness will always take longer.

>RISC seems to be somewhat superior, given the current technological and 
>economic constraints.  It seems a bit shaky though. 

The thing everyone needs to realize is that there is no one "RISC".  RISC is
more like an architectural bag of tricks designed to make faster CPUs.  Some of 
these tricks can be applied to older CPU designs, others probably can't.  The
one who wins is the one who goes the fastest, regardless of what it took to
get there.  The winner these days changes on a pretty regular basis.  May you
live in interesting times.

>	Raja.

>P.S.  Will the 68040 be pin compatiable with its predecessors?  I thought 
>it ought to be, but some articles have said it wasn't?  

No, the refined the bus arctitecture and various other bits to optimize
the '040's path to external memory in the fastest possible case.  That
just makes it more difficult on designers, (another CPU bus to learn, work
to adapt it to older systems, etc) but as long as you get something in 
return, it's no big deal.

-- 
Dave Haynie Commodore-Amiga (Systems Engineering) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
                    Too much of everything is just enough

navas@cory.Berkeley.EDU (David C. Navas) (02/03/90)

In article <21699@pasteur.Berkeley.EDU> lachesis@nyquist.bellcore.com writes:
>In article <1990Feb2.073513.29698@agate.berkeley.edu> laba-3en@web-1c (Raja S Kushalnagar) writes:
>>The Motorola 68040, if what I've been reading about here is to be believed,
>>has almost been able to surmount the difficulties of deep pipelining, though
>>it seems not to have surmounted the issue of efficient window register 
>>meshing, and probably might not be able to get over it at all, unless it 
>>goes more the RISC way, which it already seems to be going towards, but while
>>retaining all the advantages of CISCs.  If that can be done, it should really
>
>I beg to differ.  During a study of MIPS vs. RISC by DEC's WRL, it was found
>that a better compiler will *always* do better for code by using a large
>register set instead of wasting all that register space using a register
>window system.

   Yep, I've heard similarly.

   In reality, I'm sure there are just as many people that think that 
   register windowing is a good thing, as there are people that think
   the entire idea is a bit flakey.  [For interested parties, I belong
   to the latter.]

   To answer a question of Raja's...  The 68040 is *not* pin compatible
   for several reasons, most of which have to do with the fact that they
   no longer support external FPU/MMUs...  [They don't have to...]

   Byte has a very interesting, but brief, article about the 68040.
   The magazine also has some very nice things to say about the Amiga
   in it...  AMAZING!!!  Well worth the read, I think...

David Navas
navas@cory.berkeley.edu

laba-3en@web-1d.berkeley.edu (Raja S Kushalnagar) (02/04/90)

In article <21699@pasteur.Berkeley.EDU> lachesis@nyquist.bellcore.com writes:
>>In article <1990Feb2.073513.29698@agate.berkeley.edu> laba-3en@web-1c (Raja S Kushalnagar) writes:
>>it seems not to have surmounted the issue of efficient window register 
>>meshing, and probably might not be able to get over it at all, unless it 
>>goes more the RISC way, which it already seems to be going towards, but while
>>retaining all the advantages of CISCs.  If that can be done, it should really
>
>I beg to differ.  During a study of MIPS vs. RISC by DEC's WRL, it was found
>that a better compiler will *always* do better for code by using a large
>register set instead of wasting all that register space using a register
>window system.  In fact, the prime reason why Berkeley did the RISC with
>register windows was that the team producing the RISC and RISC II did not
>include compiler experts.  So, the 680x0 series could improve by adding more
>registers, and improving compiler technology (especially register
>allocation) to take advantage of those registers (as was done with the MIPS).

Yes, it was true that the Berkeley team didn't have any compiler experts, 
but, the emphasis on increased compiler efficiency is a more of a 
software consideration than hardware, and shifts the burden excessively 
onto the compiler experts. And the RISC II did have 32 general registers. 
True, window mechanism might not allow as much optimization as say, on the
MIPS, but it is an hardware solution, and thus amenable to dramtic performance
improvements as hardware technology leapfrogs ahead. Hardware technology, 
by neccessity, is usually a couple of years ahead of software technology. 

A few years back, people got interested in a software solution for 
improving the CISC's, - Writable Control Store (WCS).  This was devised to 
overcome the problem of computers running three or four microcycles per 
instruction - and make it possible for most instructions to be run in a
single microcycle.  It became popular, but eventually fell out of favour due
to virtual memory and multiprocess complications.  It didn't last more than 
a couple of years, I think. 

I am just drawing a comparision - I don't know if it's really justified, but
it does seem that way. 

>The problem with a large register file is that when a context switch occurs,
>the enter file has to be dumped.  Also, the nesting level of functions,
>should they exceed the (typically small) register window nesting level, will
>cause an overflow and require a complete dump of all the register windows.
>According to the WRL report (which I can't quote from since I don't have it
>with me, see the technical report: Register allocation versus Register
>windows by Digital Equipment Corp. Western Research Labs), this causes such
>a performance penalty that MIPS, with its better compilers, came out
>significantly faster.

Most procedures generally have less than five or six variables, and almost
never more than 10 - 12, and most of them are heavily used - a half to two
thirds of the dynamically executed referenced to operands. Procedure calls 
are slowed down when there are a great many registers. So, having multiple
register windows (or sets) means that registers would not have to be saved 
and restored on every procedure call. This works simply because of the 
generally ordered nature of programs - i.e nesting and loops. Programs 
rarely execute a long uninterrupted sequence of calls followed by an equally
long uninterrupted sequence of returns.  Now, that by itself, would probably 
be too slow to be justified, so the overlapping mechanism is used: Rather
than copy the parameters from one wqindow to another on each call, windows are overlapped so that some registers
are simultaneously part of two windows. So by putting parameters into the
overlapping registers, operands are passed automatically.  A disadvantage 
though, is that register windows use more chip area and of course the 
context switches.  But the context switches occur rather rarely - about 
100 to 1000 procedure calls for every switch. They require that two or three 
times as many registers be saved on the average, than chips without the
register window mechanism.  The penalty is not all that great, considering
the speed up.  

You could very easily make a certain program run much faster on the MIPS 
than on the RISC and another program vice-versa. 

>In fact, with the advent of portable compilers, I'd warrant that it'll be
>cheaper in the long run.  (Just do the compiler once)

I beg to differ.  You just can't have a generic portable compiler and hope to
make it exploit the architecture it is on, without any major tinkering. 
Also, optimizing compilers have a drawback in that they are much slower 
than conventional compilers (half the speed, even) and that they ignore the
register saving penalty of procedure calls. 

>>RISC seems to be somewhat superior, given the current technological and 
>>economic constraints.  It seems a bit shaky though. 
>
>RISC is not shaky.  Given the current technology, it is probably the only
>architecture capable of being implemented on the latest (highest speed)
>technologies like GaAr.

You are probably right. It has taken an awfully long time for GaAr to become
viable, though. I remember reading about them as far back as '81, though I 
doubt I understood what it really meant. :)

>lachesis@nyquist.bellcore.com

Raja.
How can the very concept of junk bonds exist? The US has paid a big penalty 
for that. If a Milliken can get a 550 million salary, that is almost 0.1%
of the entire US population income, I can't but help wonder. 
#############################################################################
raja@{soda,ocf,athena}.berkeley.edu | root@athena.berkeley.edu | laba-3en@web
-----------------------------------------------------------------------------

poirier@dg-rtp.dg.com (Charles Poirier) (02/07/90)

In article <1990Feb3.211701.15279@agate.berkeley.edu> laba-3en@web-1d (Raja S Kushalnagar) writes:
<In article <21699@pasteur.Berkeley.EDU> lachesis@nyquist.bellcore.com writes:
<>
<>RISC is not shaky.  Given the current technology, it is probably the only
<>architecture capable of being implemented on the latest (highest speed)
<>technologies like GaAr.
<
<You are probably right. It has taken an awfully long time for GaAr to become
<viable, though.

Yeah, they've been working long and hard on that Gallium Argonide process.
Fast as heck, but the chips always decompose within 10 microseconds.  :-)

(Hint: Arsenic is As, not Ar.)

	Cheers,
	Charles Poirier