[comp.arch] Low End NeXTs

sef@kithrup.COM (Sean Eric Fagan) (04/04/91)

I would suggest that this thread continue in comp.arch, which is more suited
to it.

In article <27fa3350.6bc2@petunia.CalPoly.EDU> araftis@polyslo.CalPoly.EDU (Alex Raftis) writes:
>What makes you say this. Sure RISC is nice, and it has it's place, but 
>look at what advantages RISC gives. It has few intructions, so they execute
>quickly. That's nice, but each instruction does very little, so you need
>five instructions to do the same thigns a CISC processor can do. 

Which end up not being used.

Fact:  the current generation RISCs are almost invariably "faster" than the
current generation CISCs.  Consider 68030 vs. R3000 (I believe those were
about comparable).  Consider 68040 vs. R6000.  In both cases, the MIPS chip
is "faster" (the SPECmarks are higher, by a considerable margin).

Few compilers use any of the nifty instructions that a CISC has.  gcc, for
example, uses a relatively mundane set of instructions for the 68k line.
Most instructions get unused this way, in a CISC; on the other hand, most of
the instructions that the MIPS series has *are* used by the compiler (the
exceptions being about a dozen or so that are used to do things like page
control, processor control/setup, etc.).

>A RISC
>processor has a lot a registers to work with for speed. Well, the 68040
>has sixteen registers, which I find more than plenty when programming.

Maybe you don't, but I can come up with cases where it isn't.  Using the
MIPS again.  It has 32 integer registers (one of which is hardwired to 0, I
believe).  Some of the registers are, by convention, used to pass in
subroutine arguments.  Result?  Subroutine calls are *fast*.  Also, the
return address goes into r31, not onto a stack; this is also quite faster
than the 68040 (MIPS does:  r31 := pc+1; pc = addr, while the 68k has to do
quite a bit more, including storing into memory, decrementing the sp, and
*then* incrmeneting the pc, as well as parsing the addressing mode).

>What are some of its disadvantages? Well, floating point work generally
>is more difficult. They're also nearly impossible to work with at the assembly
>level due to the amount of work the programmer has to do to make the 
>advantages of RISC, like pipelining work.

So?  How often do you program at the assembly level for unix?  I rarely do;
when I do, I use gcc to help.  The MIPS assembler takes care of pipeline
slots, so you don't have to deal with *that* part of it.  In addition, you
*do* realize that the R6000 has essentially the same FP instructions that
the 68040 does?  Yeah, that's right.  The 68040 got rid of *tons* of
instructions.  There goes *that* part of your "CISC is better than RISC"
argument.

>On the other hand, look at the 68040. It executes instructions at around
>1.3 cycles per instuction. 

That depends on your instruction stream.  The R3000 gets about the same
ratio, although I seem to recall "1.2"; however, I don't have any reference
to that, so that's just hearsay.

>It gives you lots of registers. 

So does the R3000.

>It's easy to
>work with at the assembly level while its easy to write compilers for. 

Actually, I have no trouble writing in assembly for any of the current RISC
chips.  And the R3000 is easier to write a compiler for than the 68k;
there's a lot of stuff you don't have to worry about.  (For example, all the
addressing modes.  Sure, they make your code smaller, but not necessarily
faster.)

>It's only major problem is
>in clock speed. A 25 Mhz 68040 can beat a SPARC at 25Mhz, but the SPAC's
>top speed is 40Mhz, which will easily beat the 040. Motorola claims to be
>working on a 50Mhz version of the 040, but I don't have any idea of when
>they claim this will be released.

I don't like the SPARC.  It's got an ugly instruction set.  How about the
RIOS or the MIPS?  They are acknowledged to be the fastest chips, generally
(although the new HP chip[set?] also seems to be pretty fast, as well).

>With their current strategy
>they can work their way into the workstation market with their 25 Mhz
>cube, which requires low cost memory and support hardware, while they wait
>for faster versions of the processor to come onto the market. 

When a 50MHz 68040 comes out, I expect MIPS to come out with a 75MHz (or
faster) R4000; this will be able to get 150MIPS max, and should average
around 100MIPS.  I know that doesn't mean much; let's just say it will have
such impressive SPECmarks (assuming the system is designed to keep pace)
that Motorola would have to come out with a 120MHz 68040 to keep pace.

>Once this
>occurs, they can begin to release faster version of the Cube, while still
>selling their current models as the Workstation for the rest of us.

You can only make things go so fast at any given technology.  Using the
technology to put all of those transistors on a 68040, the RISC people will
make a much faster chip (see R6000, RIOS, R4000 when it comes out).

Go read _Computer Architecture:  A Quantitive Approach_, by Hennessey and
Patterson.  Then make your arguments again.

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

mash@mips.com (John Mashey) (04/04/91)

In article <1991Apr03.232400.1560@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes:
>I would suggest that this thread continue in comp.arch, which is more suited
>to it.

>In article <27fa3350.6bc2@petunia.CalPoly.EDU> araftis@polyslo.CalPoly.EDU (Alex Raftis) writes:

>>On the other hand, look at the 68040. It executes instructions at around
>>1.3 cycles per instuction. 
It is very difficult to verify such a number, or make reasonable
comparisons without being the architects.  As noted in earlier posting,
if you try to use SPECint/Mhz, you find that the best CISC micros,
with 128KB external caches, get about .5 - .53 (@ 25MHz), i.e.,
12.9 - 13.3 for 68040 & 486.  RISCs get as much as .75-.80
(MIPS, IBM, HP PA). Using that as inverse of CPI, one gets
cycles/SPEcint of 1.88 (at best) for the CISCs, and 1.25 for RISCs.
Of course, FP is a more variable story; the RISCs have, at minimum
at least some additional advantage, more than integer,
and often a lot more.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086

lewine@cheshirecat.webo.dg.com (Donald Lewine) (04/05/91)

|> 
|> >In article <27fa3350.6bc2@petunia.CalPoly.EDU> araftis@polyslo.CalPoly.EDU (Alex Raftis) writes:
|> 
|> >>On the other hand, look at the 68040. It executes instructions at around
|> >>1.3 cycles per instuction. 
|> It is very difficult to verify such a number, or make reasonable
|> comparisons without being the architects.  As noted in earlier posting,
|> if you try to use SPECint/Mhz, you find that the best CISC micros,
|> with 128KB external caches, get about .5 - .53 (@ 25MHz), i.e.,
|> 12.9 - 13.3 for 68040 & 486.  RISCs get as much as .75-.80
|> (MIPS, IBM, HP PA). Using that as inverse of CPI, one gets
|> cycles/SPEcint of 1.88 (at best) for the CISCs, and 1.25 for RISCs.
|> Of course, FP is a more variable story; the RISCs have, at minimum
|> at least some additional advantage, more than integer,
|> and often a lot more.

ERROR: Invalid mixing of Data in line 9!

When they say that the 68040 takes 1.3 cycles per instruction, they
mean native 68040 instructions.  You can not infer anything about
clocks per native instruction from a SPECint number.

A VAX, a Data General MV/40000, a MIPS R3000 and a 486 all execute
a vastly different number of native instructions when running the
spec suite.

I fully agree with your first statement, "It is very difficult to
verify such a number."  It is possible to count the instructions
executed and the number of clock cycles required, however, it is 
only easy if you have special hardware and/or simulation tools.
The architects tend to have these tools.

--------------------------------------------------------------------
Donald A. Lewine                (508) 870-9008 Voice
Data General Corporation        (508) 366-0750 FAX
4400 Computer Drive. MS D112A
Westboro, MA 01580  U.S.A.

uucp: uunet!dg!lewine   Internet: lewine@cheshirecat.webo.dg.com

doug@eris.berkeley.edu (Doug Merritt) (04/06/91)

In article <1991Apr4.125122.1@capd.jhuapl.edu> waltrip@capd.jhuapl.edu writes:
>	He observes that "Few compilers use any of the nifty instructions that
>	a CISC has."  I believe I read recently that RISC was based on the
>	observation that, in fact, only about 30% of the instructions in CISC
>	computers were used by compilers.  The rest of the instructions, for
>	all practical purposes, were just excess baggage.

Not only that, but even when the other instructions and addressing modes
*are* used, it doesn't help that much. A few years ago I wrote a compiler
(code generator, actually) that used essentially all of the addressing
modes, and almost all of the instructions in the 68020. The results were
not encouraging. In the absolute best case, static code density improved
maybe 25%, but usually closer to 0% (break-even). This is because the extra
features almost always take as many bytes to encode in a single fancy
instruction as the equivalent multi-instruction sequence. Dynamic code
speed showed similar results.

(The effort wasn't wasted, though, because the original prototype code
generator was highly suboptimal even for simple instructions.)

Anyway, the point is that even if compilers *do* use every possible feature
of a CISC, it still usually won't make the CISC s/w competitive with RISC
s/w. CISC cpus almost never put sufficient h/w optimization into the support
of the fancy instructions. (There are exceptions to this, of course, and I
haven't been watching the 030 and 040 to see how they do in this regard.
CISC may yet rise again, but not until after superscalar RISC has been
completely exploited.)
	Doug
--
Doug Merritt		doug@eris.berkeley.edu (ucbvax!eris!doug)
		or	uunet.uu.net!crossck!dougm

henry@zoo.toronto.edu (Henry Spencer) (04/07/91)

In article <1991Apr03.232400.1560@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes:
>Fact:  the current generation RISCs are almost invariably "faster" than the
>current generation CISCs.  Consider 68030 vs. R3000 (I believe those were
>about comparable).  Consider 68040 vs. R6000.  In both cases, the MIPS chip
>is "faster" (the SPECmarks are higher, by a considerable margin).

Or for a *really* straight comparison, due to John Mashey I think, compare
the i860 to the i486:  same tools, same process, same chip size, roughly
the same release time... and the RISC machine is faster, much faster, in
every way.
-- 
"The stories one hears about putting up | Henry Spencer @ U of Toronto Zoology
SunOS 4.1.1 are all true."  -D. Harrison|  henry@zoo.toronto.edu  utzoo!henry

henry@zoo.toronto.edu (Henry Spencer) (04/07/91)

In article <1991Apr5.172533.6717@agate.berkeley.edu> doug@eris.berkeley.edu (Doug Merritt) writes:
>... CISC cpus almost never put sufficient h/w optimization into the support
>of the fancy instructions. (There are exceptions to this, of course, and I
>haven't been watching the 030 and 040 to see how they do in this regard...

Actually, on the 040, by the published descriptions, the situation is very
simple:  the 68000 subset, plus 32-bit absolute addresses, minus indexed
addressing, is fast.  Everything else -- all the goo the 020 added -- is slow.
-- 
"The stories one hears about putting up | Henry Spencer @ U of Toronto Zoology
SunOS 4.1.1 are all true."  -D. Harrison|  henry@zoo.toronto.edu  utzoo!henry

firth@sei.cmu.edu (Robert Firth) (04/08/91)

In article <1991Apr7.065105.25586@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:

>Actually, on the 040, by the published descriptions, the situation is very
>simple:  the 68000 subset, plus 32-bit absolute addresses, minus indexed
>addressing, is fast.  Everything else -- all the goo the 020 added -- is slow.

My rough calculations (again, based on the documentation) agree with
Henry's.  As with the 68020, the indexed address modes are on balance
slower than the equivalent naive code using only the 68000 address
modes.  Once in a while, you'll be able to avoid an extra register
save and restore, for a marginal gain; but in general, forget them.

jesup@cbmvax.commodore.com (Randell Jesup) (04/15/91)

In article <23724@as0c.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
>My rough calculations (again, based on the documentation) agree with
>Henry's.  As with the 68020, the indexed address modes are on balance
>slower than the equivalent naive code using only the 68000 address
>modes.  Once in a while, you'll be able to avoid an extra register
>save and restore, for a marginal gain; but in general, forget them.

	I mostly agree, though note that the '020 had more overhead on the
new modes than the '030 did - on an '030, some of the modes are actually
a win (I think d8(An,Xn*scale) becomes a win, for example, in certain cases).
I use a compiler that has different specific '020/'030 optimizations, and
on an '030 it is more willing to use some of the new addressing modes.  I
don't know if the '040 has maintained the split-point of utility, but I
wouldn't be suprised if it were in about the same spot.
 
	The 32-bit multiply and divide on the '020 and above are a large
win.  For certain types of code (not the general case - cpu blitter code,
for example) the bitfield instructions are a win, and of course the barrel-
shifter.

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com  BIX: rjesup  
Disclaimer: Nothing I say is anything other than my personal opinion.
Thus spake the Master Ninjei: "To program a million-line operating system
is easy, to change a man's temperament is more difficult."
(From "The Zen of Programming")  ;-)