[comp.sys.m68k] Recent Motorola ad seen in Byte

root@sbcs.UUCP (03/23/87)

(Enable Flame)

What did you, fellow Usenetters, think of the recent Motorola advertisement
(April '87 Byte) about 68020 -vs- 80386?  Used to be that Intel dispensed
copious amounts of smoke when comparing their processors vs those from
Motorola.  It seems now that Intel has a somewhat serious machine, the shoe 
is on the other foot and Motorola seems to be running the smear campaign.

While I am on the subject, did anyone else catch Scott McNealys (Sun Micr.
Prez) regarding the '386 - they went something along the lines (rough
quote): "no high performance Unix exists for the chip - therefore it
has architectural problems".  I suppose the 68020 had the same archtectural
problems when it first came out, but they just "got better".  Oh well, I
guess company presidents should be kept in a Bell jar when not discussing 
yachting, eh?

(Disable Flame)

						Rick Spanbauer
						SUNY/Stony Brook

wheels@mks.UUCP (03/26/87)

In article <362@sbcs.UUCP>, root@sbcs.UUCP (Root) writes:
> What did you, fellow Usenetters, think of the recent Motorola advertisement
> (April '87 Byte) about 68020 -vs- 80386? ...
>  ... and Motorola seems to be running the smear campaign.

I saw the ad, but I'm not familiar with the benchmarks mentioned, nor with
Intel's methods. Were the comments about Intel's timings wrong? If not,
what's your beef?
-- 
Gerry Wheeler                  {seismo,decvax,ihnp4}!watmath!mks!wheels
Mortice Kern Systems Inc.

lodman@ncr-sd.UUCP (03/27/87)

In article <362@sbcs.UUCP> root@sbcs.UUCP (Root) writes:
>What did you, fellow Usenetters, think of the recent Motorola advertisement
>(April '87 Byte) about 68020 -vs- 80386?  Used to be that Intel dispensed
>copious amounts of smoke when comparing their processors vs those from
>Motorola.  It seems now that Intel has a somewhat serious machine, the shoe 
>is on the other foot and Motorola seems to be running the smear campaign.

I have seen some ads from Motorola (the apples and oranges ad)
and I thought what they said was fairly interesting. We had heard
that they cheated on their dhrystone etc. benchmarking long before
Motorola claimed this was so. I haven't seen the ad in question yet.

If Intel at last has a serious machine, it would seem that they have
a credibility gap to close, which was opened after years of 8088,8086
etc. garbage that they sold and people bought. I will believe it
when I have a '386 in my hot little hand. Until then, I'm a little
skeptical.

Personally, I find Motorola's claims much more believable.


-- 
Michael Lodman
Advanced Development NCR Corporation E&M San Diego
mike.lodman@SanDiego.NCR.COM 
{sdcsvax,cbatt,dcdwest,nosc.ARPA,ihnp4}!ncr-sd!lodman

mash@mips.UUCP (03/27/87)

In article <362@sbcs.UUCP> root@sbcs.UUCP (Root) writes:
>(Enable Flame)
>
>What did you, fellow Usenetters, think of the recent Motorola advertisement
>(April '87 Byte) about 68020 -vs- 80386?  Used to be that Intel dispensed
>copious amounts of smoke when comparing their processors vs those from
>Motorola.  It seems now that Intel has a somewhat serious machine, the shoe 
>is on the other foot and Motorola seems to be running the smear campaign.

This ad appeared in several places last Fall; I saw it in Computer System
News.  It's very clever and well-done ad, especially with the Intel "apples"
that look like oranges when cut open.  On the other hand, in the same
issue, Motorola was telling us how the 25MHz 68020 was "5MIPS sustained,
12.5Mips (burst mode)".  In the Byte ad, we see "Choosing the world's
highest-performance 32-bit microprocessor...The MC68020 is still the
highest-performance microprocessor..."  They cited Whetstones & Dhrystones.
Draw your own conclusions from the following [a slice of a Performance Brief
I'm posting over in net.arch in net few days]:

System			Whetstone	Dhrystone	Linpack
			DP Megawhets			FORT DP MegaFlops

Sun3/260, 68881		1.24		 6,362		.11
(25MHz 68020, 20MHz 68881)
Intergraph IP32C	1.74		 8,309		.29
(30MHz Clipper)
MIPS M/500 		4.45		10,300		.58
(8MHz R2000+R2010 FPU)
VAX 8650, VMS:		4.00		10,787		.70	for context
MIPS M/800, 12.5MHz	6.90		15,300		.80
(12.5MHz R2000+R2010)
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

ericson@uiucdcsp.UUCP (03/28/87)

>In article <362@sbcs.UUCP> root@sbcs.UUCP (Root) writes:
[most deleted]
>>is on the other foot and Motorola seems to be running the smear campaign.
>
><mash@mips.UUCP> responds:
[beginning deleted]
>highest-performance 32-bit microprocessor...The MC68020 is still the
>highest-performance microprocessor..."  They cited Whetstones & Dhrystones.
                     ^^^^^^^^^^^^^^
>Draw your own conclusions from the following [a slice of a Performance Brief
>I'm posting over in net.arch in net few days]:
[John then goes to show some stats for some hot machines that make
the 68020 look slow..]

I would like to point out my highlight above - Motorola claims it is
the hottest Microprocessor.  Obviously, the Vax 8000 series are not
in this category.  Additionally, the Clipper is a board (with multiple
chips) device.  I think the same is true for the Mips machines.  One
should also remove the FP's when only comparing the CPU's.  C'mon,
let's compare these things fairly.

          
Stuart Ericson
{ihnp4,convex,pur-ee}!uiucdcs!ericson

mash@mips.UUCP (03/29/87)

In article <75900002@uiucdcsp> ericson@uiucdcsp.cs.uiuc.edu writes:
>
>><mash@mips.UUCP> responds:
>[beginning deleted]
>>highest-performance 32-bit microprocessor...The MC68020 is still the
>>highest-performance microprocessor..."  They cited Whetstones & Dhrystones.
>
>I would like to point out my highlight above - Motorola claims it is
>the hottest Microprocessor.  Obviously, the Vax 8000 series are not
>in this category.  Additionally, the Clipper is a board (with multiple
>chips) device.  I think the same is true for the Mips machines.  One
>should also remove the FP's when only comparing the CPU's.  C'mon,
>let's compare these things fairly.

I tried to compare as fairly as possible: the VAX (of course) is a
different animal, and I certainly wasn't suggesting it was a microprocessor!
If you look at the basic CPU complexes in the referenced processors,
using the example that the Moto ad cited [Sun3/260]:

1) Sun: 68020 + 68881 + 64K SRAM cache + bunch of SRAM for memory management +
tag comparison logic + other glue [I think].

2) Clipper: 1 CPU + 2 CAMMU [cache/MMU] chips, 4K each.  CPU includes FP.

3) M/500: CPU (R2000 chip) + FPU (R2010) + 24K cache + 4 address latches

4) M/800: CPU + FPU + 128K cache + 4 address latches.

How can you remove the FPs when they're comparing Whetstones?

Note the that it is IMPOSSIBLE to make serious performance comparisons between
"bare" microprocessors, since the same "bare" micro can show radically
different performance when placed into different system environments.
(This can be seen often, where the same 16.7MHz 68020 gives different
performance according to the MMU strategy, memory system, etc.]
This set of examples was picked to be as comparable as possible,
i.e.,  honest-to-goodness low-chip-count micros in real systems.
BTW, I later received the Intel numbers that I think Moto was complaining
about, although it's hard to tell whether the complaints of "unreal
hot-box are fair or not" - I make no judgements whatsoever.
Here are some numbers for a 20MHz Intel 80386+80387, MultiBus I,
64K write-thru cache, 2-3 wait states for cache miss [that's the part that
seems "hot"].  Green Hills 1.8.2G

7,810	Dhrystones (*note: probably used optimization)

1,730	KiloWhetstones, DP

.19	MFlops, DP Linpack, Rolled

Intel also has numbers for use with a Weitek 1167, but I think that
the 386+387 comparison is probably more comparable with the 68020/68881.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

ed@plx.UUCP (04/01/87)

In article <1466@ncr-sd.SanDiego.NCR.COM>, lodman@ncr-sd.SanDiego.NCR.COM (Mike Lodman) writes:
> In article <362@sbcs.UUCP> root@sbcs.UUCP (Root) writes:
> >What did you, fellow Usenetters, think of the recent Motorola advertisement
> >(April '87 Byte) about 68020 -vs- 80386?  Used to be that Intel dispensed
> >copious amounts of smoke when comparing their processors vs those from
> >Motorola.  It seems now that Intel has a somewhat serious machine, the shoe 
> >is on the other foot and Motorola seems to be running the smear campaign.
> 
> 
> Personally, I find Motorola's claims much more believable.
> 
> 
 Me too, I'm told that if you want to take advantage of the 386 performance,
 you have to use some DISGUSTINGLY EXPENSIVE  RAM. 

 The other thing to remember is that the Motorola parts are SHIPPING at
 25Mhz.  A 25Mhz '020 blows the doors off a 16.? Mhz '386.

 To me, the only advantage of '386 is the virtual DOS machine capability.
 the ability to run DOS as a task under UNIX  seems neat.

 Now the REAL screamer is CLIPPER. The nice thing about CLIPPER is
 that you can really cut down on all those support chips (building an
 8k cache out of discrete components is EXPENSIVE.  

 -ed-
> -- 
> Michael Lodman
> Advanced Development NCR Corporation E&M San Diego
> mike.lodman@SanDiego.NCR.COM 
> {sdcsvax,cbatt,dcdwest,nosc.ARPA,ihnp4}!ncr-sd!lodman

mash@mips.UUCP (04/02/87)

In article <580@plx.UUCP> ed@plx.UUCP (Ed Chaban) writes:
...discussion of 68K versus Intel merits....
> Now the REAL screamer is CLIPPER. The nice thing about CLIPPER is
> that you can really cut down on all those support chips (building an
> 8k cache out of discrete components is EXPENSIVE.  

Sigh.  If you can support that statement with live benchmarks of
substantial, real programs, please post them.  Even synthetic benchmarks
(beyond those in the Intergraph 12/86 report, I have that)
of any size would be useful, since it's VERY hard to find substantive
numbers that really support the Clipper performance claims (5 Mips,
average performance 5X an 11/780) on anything but Dhrystone and "toy"
benchmarks.

8K cache: expensive? we spend about $150 for 24K of cache.  Maybe that's
more expensive than a pair of 300K+ transistor Clipper CAMMUs, but I doubt it.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

davidsen@steinmetz.UUCP (04/02/87)

In article <580@plx.UUCP> ed@plx.UUCP (Ed Chaban) writes:
....
>> 
> Me too, I'm told that if you want to take advantage of the 386 performance,
> you have to use some DISGUSTINGLY EXPENSIVE  RAM. 

I have the PC Designs GV386 (16Mhz at 0ws). It uses a very small chunk
of static RAM (35ns) as cache, and the rest is relatively checp memory.
I put a 2-1/2 MB ramcard in using 120ns chips, $110 for the board,
$202.50 for the memory. That's $125/MB, rather reasonable. Running on
the 16 bit bus drops the speed by about 20%, when 1Mbit chips drop a
bit I'll put more on the motherboard. Still it runs about 3x a VAX,
which is acceptable for a PC.

The bottom line is that the base price with hard disk and display is
~$4k. You just can't get a 68020 box for that price (the new Apple
looks more like $7k for the same size box, and that's a slow 68020.
Intel is starting to ship the cache controller, so the price of cache
will come down by 4th quarter (I assume).
>
> The other thing to remember is that the Motorola parts are SHIPPING at
> 25Mhz.  A 25Mhz '020 blows the doors off a 16.? Mhz '386.

So does a Cray2... compare on equal price or clock speed, please. The
actual performance is in a ratio of about 25:16 for Sun 3/260 and GV386.

-- 
bill davidsen			sixhub \
      ihnp4!seismo!rochester!steinmetz ->  crdos1!davidsen
				chinet /
ARPA: davidsen%crdos1.uucp@ge-crd.ARPA (or davidsen@ge-crd.ARPA)

dad@cipric.UUCP (04/03/87)

Thought some of you out there might be interested in this...I ran the
drystone benchmarks on an Intel 80386 Multibus II system (which by the
way is running AT&T S5R3), and got:
	5882 dryr
	5376 drynr
I had to change the standard 50000 passes to 500000 to get accurate
times.  With numbers like those, I could probably be converted to Intel
pretty quickly.  But I don't think I'll ever get over that funky addressing
technique.	-Dan
P.S. - This is about 1.8 times our Sun-3/160.

dillon@CORY.BERKELEY.EDU.UUCP (04/03/87)

>So does a Cray2... compare on equal price or clock speed, please. The
>actual performance is in a ratio of about 25:16 for Sun 3/260 and GV386.

	You mean "compare on equal memory cycle time, please.", Yah?
Remember, the crystal frequency doesn't mean a thing.  To be completely
accurate, you should also compare on internal cycle time for cache 
performance.

					-Matt

jon@eps2.UUCP (04/04/87)

In article <251@winchester.mips.UUCP>, mash@mips.UUCP (John Mashey) writes:
> In article <580@plx.UUCP> ed@plx.UUCP (Ed Chaban) writes:
> > Now the REAL screamer is CLIPPER. The nice thing about CLIPPER is
> > that you can really cut down on all those support chips (building an
> > 8k cache out of discrete components is EXPENSIVE.  
> 
> Sigh.  If you can support that statement with live benchmarks of
> substantial, real programs, please post them.  Even synthetic benchmarks

I was allowed to use an Integraph 32/C at Fairchild to do some benchmarking.  I
wanted to see how the Clipper would perform against a 68020 at "graphics"
operations.  The first program I ran simulates an airbrush.  You start out
with a piece of frame buffer and a "cell" density, which could be a gaussian
distribution airbrush with some stipple.  The pixel becomes (density * color of
airbrush) + ((1 - density) * original pixel)).  This was done 5000 times on
a 32 x 32 airbrush cell.  The times were:

Sun-3/160	cc -O	28.3
Integraph 32/C	cc	30.3	(Greenhills compiler)

A bug in my program which did not affect the timing prevented using the
optimizer (-O, -O2) on the Integraph.  I was suprised the Clipper was
slower.  It would probably match the 68020 with the -O2 option.

Then I ran a program which simulates a blit.  Basically it moves a megabyte
of memory as long words 32 times.  The times were:

Sun-3/160	cc -O	5.8
Sun-3/160 asm version	5.3
Integraph 32/C	cc	7.3
Integraph 32/C	cc -O	7.2
Integraph 32/C	cc -O2	6.8
Integraph 32/C asm ver	6.3

I had thought the burst loading of the cache would make the blit run
at least as fast as the 68020, but I was wrong.  Incidentally, the *p++ = *q++
becomes move.l (a0)+, (a1)+ on the Sun C compiler, but becomes five Clipper
assembly instructions with Green Hills.  Given 60ns machine cycles for the
68020, and 270ns memory cycles, I figure (3 + 2 wait states) * 60ns = 300ns
to read or write a long word from memory.  So the bandwidth of memory is
(1,000,000,000 ns/sec) * (1 long word / 300ns) * (4 bytes / longword) =
13,333,333 bytes/sec.  The Sun  was reading and writing 64M bytes in 5.3
seconds, so it was moving bytes up near the bandwidth of memory, which is
kind of nice.

Actually, we have hardware to do blits and airbrushes, and it runs (I would
guess) at least 10x faster than the 68020 or Clipper.  I should qualify that
by saying the blits are 10x faster if the CPU blits include multiple sources,
with look-up and ALU functions.

My conclusion was that the Clipper wouldn't perform well in our system as
a graphics processor.  Heck, it was actually slower than the 16.67Mhz
68020.  When the AMD FAE was out here, he said when the 29000
runs the same algorithms that are implemented in the QPDM (9560), it runs
them twice as fast.  That sounds pretty good, software flexibility to write
a stippled, brick-pattern airbrush, and run it at hardware speeds.

I read Fairchild's Performance White Paper.  They claim 3x the performance
of a 16.67Mhz 68020.  Maybe on dhrystone, but not on my stuff.  They claim
8064 dhrystones, to the Sun-3/160 at 2745.  I don't know about you people,
but my March 15th dhrystone says the Integraph is 5275 and the Sun is 3246.
Their claim is that 8064 is with a new compiler.  That's pretty impressive,
60% gain from the compiler.  I guess the old one had some shortcomings, huh?
Can anyone with an Integraph verify the 8064 number?  I think it would be
fairer if Fairchild compared the Integraph to a Sun-3/260, not a 3/160
anyway.

> 8K cache: expensive? we spend about $150 for 24K of cache.  Maybe that's
> more expensive than a pair of 300K+ transistor Clipper CAMMUs, but I doubt it.

I wish I could add a cache to a 68020 right now with just one chip, the
way Intel and AT&T can.  I'll have to wait for the 68030 because puny little
companies like us (don't let the DuPont name fool you, we're a subsidiary)
can't afford to design and build them.



Jonathan Hue	DuPont Design Technologies/Via Visuals		leadsv!eps2!jon
*Disclaimer: You're right, I don't know what I'm talking about*

Here are the programs I used:



unsigned char fb[0x10000];
unsigned short cell[1024];

/*
 * airbrush with multiply tables
 */
wrt_airb(bP, mP, cP, wx, wy, clr)
register unsigned char *bP, *mP;
register short *cP, wx;
short wy;
register unsigned char clr;
{
	register unsigned char pixel;
	register short d0, d1;
	register int j;

	while (wy--)  {
		j = wx;
		while (j--)  {
			d1 = *cP++;
			d0 = d1 + *bP;
			pixel = (d0 - mP[d0]) & 0xff;
			d1 += clr;
			pixel += mP[d1] & 0xff;
			*bP++ = pixel & 0xfe;
		}
		cP += (32 - wx);
		bP += (0x800 - wx);
	}
}


main()
{
	register int i;

	for (i = 0; i < 5000; i++)
		wrt_airb(fb, fb, cell, (short) 32, (short) 32, (char) 0);
}



long buffer[0x40001];

blit()
{
	register long *p, *q, i;

	p = buffer;
	q = p + 1;
	i = 16384;
	while (i--)  {
		*p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++;
		*p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++;
		*p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++;
		*p++ = *q++; *p++ = *q++; *p++ = *q++; *p++ = *q++;
	}
}

main()
{
	register int i;

	for (i = 0; i < 32; i++)
		blit();
}

tim@ism780c.UUCP (04/04/87)

> To me, the only advantage of '386 is the virtual DOS machine capability.
> the ability to run DOS as a task under UNIX  seems neat.

The neat thing about the '386 is that it can run UNIX, unlike the 68020.
With the 68020 you have to put an MMU in your system.  However, hardware
guys at companies that make affordable computers seem to have this thing
against putting MMUs in.  With the '386, no silly hardware person can
leave out the MMU.

Of course, the hardware guy could divide the physical address space into
disjoint pages, each less than 4k bytes, if he really wanted to stop
me from running UNIX...
-- 
Tim Smith			"And if you want to be me, be me
uucp: sdcrdcf!ism780c!tim	 And if you want to be you, be you
Compuserve: 72257,3706		'Cause there's a million things to do
Delphi or GEnie: mnementh	 You know that there are"

brayton@yale.UUCP (04/04/87)

In article <21@cipric.UUCP> dad@cipric.UUCP (Dan A. Dickey) writes:
>Thought some of you out there might be interested in this...I ran the
>drystone benchmarks on an Intel 80386 Multibus II system (which by the
>way is running AT&T S5R3), and got:
>	5882 dryr
>	5376 drynr

I'm assuming that this is with a 16 MHz clock.  With a 20 Mhz clock these
numbers would go up by 25% giving even more impressive results.

			Jim Brayton
-----------------------------------------------------------------------------
brayton@yale.UUCP					brayton@yale.ARPA	
-----------------------------------------------------------------------------

henry@utzoo.UUCP (Henry Spencer) (04/05/87)

> While I am on the subject, did anyone else catch Scott McNealys (Sun Micr.
> Prez) regarding the '386 - they went something along the lines (rough
> quote): "no high performance Unix exists for the chip - therefore it
> has architectural problems".  I suppose the 68020 had the same archtectural
> problems when it first came out, but they just "got better"...

Despite the stigma of being a company president, the man may know what he's
talking about.  As I recall, the MMU in the 386 really does have at least
one serious architectural problem that makes high-performance multiprogramming
distinctly difficult.  The 68020 does not share said botch.  The way you've
quoted it reverses cause and effect, but that could be a misquote or a
misunderstanding.
-- 
"We must choose: the stars or	Henry Spencer @ U of Toronto Zoology
the dust.  Which shall it be?"	{allegra,ihnp4,decvax,pyramid}!utzoo!henry

mhorne@tekfdi.UUCP (04/06/87)

In a previous article...
>>The neat thing about the '386 is that it can run UNIX, unlike the 68020....
>
>	In terms of the dreaded 68020/80386 wars, and if I took a 
>neutral stance on the speed issue, it all comes down to a preference
>of which instruction set one likes best.  I personally (and this is why
>I'm on this newsgroup) like Motorola.  Not just for their 680X0 series,
>but for their single chip microcomputers as well.  The 68705 for instance.
>And as far as quality goes, Motorola is right up there with HP.

*** BINGO! ***

After writing extensive code on the 80X86 line in assembly, I must add that
Intel assembly (all CPUs) is a major pain in the ass.  It is simply
disgusting.  Comparing Motorola's 680X0 assembly and Intel's 80X86 assembly
is like comparing the proverbial 'apples and oranges', except Intel's
assembly is the apple, and it's rotten to the core...

Let's face it;  The 80X86 line is a kludge that started way back with the
8080, and they refuse (or don't have the ability) to bring out a new machine
with modern instructions.  Have you ever seen another CPU (modern, that is)
that has instructions like Intel's?  And if so, is it popular?  IBM did
Intel an immense favor by using the 80X86 in it's PC.  If they hadn't, 
Intel wouldn't be around right now...

-- 
---------------------------------------------------------------------------
Michael Horne - KA7AXD                  UUCP: tektronix!tekfdi!honda!mhorne
FDI group, Tektronix, Incorporated      ARPA: mhorne@honda.fdi.tek.com
Day: (503) 627-1609                     HAMNET: ka7axd@k7ifg

clif@intelca.UUCP (04/07/87)

> In article <21@cipric.UUCP> dad@cipric.UUCP (Dan A. Dickey) writes:
> >Thought some of you out there might be interested in this...I ran the
> >drystone benchmarks on an Intel 80386 Multibus II system (which by the
> >way is running AT&T S5R3), and got:
> >	5882 dryr
> >	5376 drynr
> 
> I'm assuming that this is with a 16 MHz clock.  With a 20 Mhz clock these
> numbers would go up by 25% giving even more impressive results.
> 
> 			Jim Brayton
> -----------------------------------------------------------------------------
> brayton@yale.UUCP					brayton@yale.ARPA	
> -----------------------------------------------------------------------------

Those numbers are also using PCC compiler, a good optimizing Compiler
like Green Hill's C 1.8.2G or Metaware C-386 1.3 increase these times
by additional 15-20%.  


-- 
Clif Purkiser, Intel, Santa Clara, Ca.
{pur-ee,hplabs,amd,scgvaxd,dual,idi,omsvax}!intelca!clif

These views are my own property.  However anyone who wants them can have 
them for a nominal fee.
	

tomk@intsc.UUCP (04/10/87)

> In article <580@plx.UUCP> (Ed Chaban) writes:
> In article <1466@ncr-sd.SanDiego.NCR.COM>, lodman@ncr-sd.SanDiego.NCR.COM (Mike Lodman) writes:
>> In article <362@sbcs.UUCP> root@sbcs.UUCP (Root) writes:
>>>What did you, fellow Usenetters, think of the recent Motorola advertisement
>>>(April '87 Byte) about 68020 -vs- 80386?  Used to be that Intel dispensed
>> 
>> Personally, I find Motorola's claims much more believable.
>> 
>  Me too, I'm told that if you want to take advantage of the 386 performance,
>  you have to use some DISGUSTINGLY EXPENSIVE  RAM. 
> 
Not TRUE!!!  To run a near 0ws 16MHz machine you can do a small system 
(<8MB) with 100ns DRAMs.  A larger system requires 80ns.  If you want
to put a cache controller in the design it uses 45 ns statics for the data
and 25 to 35 ns statics for the tags depending on the control circuit.  With
the forthcoming cache controller chip you can use 35ns SRAMS with  20MHz CPU.

>  The other thing to remember is that the Motorola parts are SHIPPING at
>  25Mhz.  A 25Mhz '020 blows the doors off a 16.? Mhz '386.
> 
Show me a benchmark that does not fit in 256 bytes thats even keeps up
with at 16MHz 386.  386's are now shipping at 20MHz for the speed freaks.
25MHz soon.

>  To me, the only advantage of '386 is the virtual DOS machine capability.
>  the ability to run DOS as a task under UNIX  seems neat.
> 
That plus speed too.  

>  Now the REAL screamer is CLIPPER. The nice thing about CLIPPER is
>  that you can really cut down on all those support chips (building an
>  8k cache out of discrete components is EXPENSIVE.  
> 
I will let the intergraph (the only known implementation) benchmark 
numbers speak for themselves.

I can see that a lot of people are going to start comparing the Compaq
386 machine against the Sun 3/260.  Just remember that the cost of a box
has a lot to do with how much performance optimization goes into the design.
The street price for a 386AT clone will be around $3500 by summer.  That is
compared to an $8000-$50000 Sun machine.  Price performace is still a rule
that we are stuck with.  Or in other words,  the CPU doesn't mean nearly as
much as the memory and I/O subsystems when you talk about performance.  I 
will be the first one to admit that the IBM PC and its derivatives are a 
kludge.  But don't blame the 386 for IBM's incompetence.

------
"Ever notice how your mental image of someone you've 
known only by phone turns out to be wrong?  
And on a computer net you don't even have a voice..."

  tomk@intsc.UUCP  			Tom Kohrs
					Regional Architecture Specialist
		   			Intel - Santa Clara

P.S.  If anyone wants to see a real 386 machine in action call your 
local sales office for a demo and get your benchmarks ready.

caf@omen.UUCP (04/11/87)

In article <930@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:

:Show me a benchmark that does not fit in 256 bytes thats even keeps up
:with at 16MHz 386.  386's are now shipping at 20MHz for the speed freaks.
:25MHz soon.

Well, here's one that takes 8k, somewhat larger than 256 bytes.  A 25 mHz
68020 board more than keeps up with a 18 mHz 386 box (let alone 16 mHz.)

The Computer Dynamics 386 uses an Intel 386 motherboard goosed to 18 mHz,
apparently without ill effect.  The system uses all 32 bit Intel ram (2.5
MB total including 32 bit memory expansion board).  Note that the IBM top
of the line PC2 20 mHz 386 machine is specified at one wait state, same as
my box.

I have been told that the best 68k C compilers usually beat the code density
of 8086/286 C compilers but have not verified this for myself.

Before Intel flames these numbers, I suggest they provide me a hotter
386 chip and/or 386 Unix, and I shall post updated numbers once I have
made sure they represent real systems.  My address is below.

I should also like to run this benchmark on the 386 in 286 pinouts chip
PC-WEEK announced a few weeks ago. I have two PC-AT machines ready
to go.


Compile - Link		Execute	Code
Real	User	Real	User	Bytes	System

7.4	.8	.34	.3416	124	Definicom SYS 68020 25mHz SiVlly 11/86
11.8	2.8	.56	.56	131	CompDyn (Intel MB) + 386 Toolkit 12/86

	Sieve benchmark (Slightly modified from Byte Magazine version)
		12-07-86 Chuck Forsberg Omen Technology Inc


NOTE: If the resulting time is too short to measure with a precision of
a couple of percentage points, increase the number of outer loops (n) to
1000 or (if running on vaporware microcomputers) to 1000, and scale the
result accordingly. 

siev.c:
#define S 8190
char f[S+1];
main()
{
/*	register long i,p,k,c,n;	For 32 bit entries for PC */
	register int i,p,k,c,n;
	for (n = 1; n <= 10; n++) {
		c = 0;
		for (i = 0; i <= S; i++) f[i] = 1;
		for (i = 0; i <= S; i++) {
			if (f[i]) {
				p = i + i + 3; k = i + p;
				while (k <= S) { f[k] = 0; k += p; }
				c++;
			}
		}
	}
	printf("\n%d primes.\n", c);
}

Chuck Forsberg WA7KGX Author of Pro-YAM communications Tools for PCDOS and Unix
...!tektronix!reed!omen!caf  Omen Technology Inc "The High Reliability Software"
  17505-V Northwest Sauvie Island Road Portland OR 97231  Voice: 503-621-3406
TeleGodzilla BBS: 621-3746 2400/1200  CIS:70007,2304  Genie:CAF  Source:TCE022
  omen Any ACU 1200 1-503-621-3746 se:--se: link ord: Giznoid in:--in: uucp
  omen!/usr/spool/uucppublic/FILES lists all uucp-able files, updated hourly

galen@oucs.UUCP (04/12/87)

In article <930@intsc.UUCP>, tomk@intsc.UUCP (Tom Kohrs @fae) writes:
> 
> I can see that a lot of people are going to start comparing the Compaq
> 386 machine against the Sun 3/260.  Just remember that the cost of a box
> has a lot to do with how much performance optimization goes into the design.
> The street price for a 386AT clone will be around $3500 by summer.  That is
> compared to an $8000-$50000 Sun machine.  Price performace is still a rule
> that we are stuck with.  Or in other words,  the CPU doesn't mean nearly as
> much as the memory and I/O subsystems when you talk about performance.  I 
> will be the first one to admit that the IBM PC and its derivatives are a 
> kludge.  But don't blame the 386 for IBM's incompetence.

Realistically, could a 386AT clone do everything that the SUN does.  (Very 
high-res graphics ... 8| ) for the same price and give the performance?  

It also seems to me that IBM ** OWNS ** the majority of the Intel stock???

I personally perfer the 68xxx processors to any Intel processor.  The 
instruction set is better in my opinion, and the registers are more
versatile.

(Flame off)
BTW, why don't we set up a comp.sys.religious-wars or something and get this
out of here???
(Flame others and self!!! 8) 8) ... )

#include <standard-disclamer.h>
-- 
----S----N----A----R----K----S----&----B----O----O----J----U----M----S----
Douglas Wade Needham     (614)593-1567 (work) or (614)597-5969 (Home)
Electrical Engineering Dept., Ohio University, Athens, Ohio 45701 
UUCP: ...!cbatt!oucs!galen ** Smart Mailers: galen@pdp.cs.OHIOU.EDU

brewster@watdcsu.UUCP (04/12/87)

In article <532@pdp.cs.OHIOU.EDU>, galen@pdp.cs.OHIOU.EDU (Douglas Wade Needham) writes:
> 
> It also seems to me that IBM ** OWNS ** the majority of the Intel stock???
> 

	I think the figure is closer to  20%, and with AT&T break-up I
	doubt that this figure will rise.   But even if IBM did own Intel, the
	implication that Intel processors (specifically 8088) were built to
	IBM specs (ie for intended use in PC's) seems ludicrous.

#include <standard-disclamer.h>
#include <standard-trademarknotice.h>
                                                   
						   Try not  to become  a  man
UUCP  : {decvax|ihnp4}!watmath!watdcsu!brewster    of success but rather  try
Else  : Dave Brewer, (519) 886-6657                to  become a  man of value.
                                                         Albert Einstein

djl@mips.UUCP (04/13/87)

In article <513@omen.UUCP>, caf@omen.UUCP (Chuck Forsberg WA7KGX) writes:
> In article <930@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:
> 
> :Show me a benchmark that does not fit in 256 bytes thats even keeps up
> :with at 16MHz 386.  386's are now shipping at 20MHz for the speed freaks.
> :25MHz soon.
> 
> Well, here's one that takes 8k, somewhat larger than 256 bytes.  A 25 mHz
> 68020 board more than keeps up with a 18 mHz 386 box (let alone 16 mHz.)

Goes on to include a version of seive, a classic small integer benchmark.
Please note, on the MIPS R2000, the entire main() procedure of this
benchmark (not just the inner loop, which is what counts), is 53
instructions, or 224 bytes.  Most of the total text size comes from printf().

It is safe to assume that on a CISC machine, the inner loop fits quite
nicely in a 256 byte I cache. this is exactly what you must avoid if trying
to get a real handle on performance under actual conditions.

If what you care about is performance of a processor on very small
integer compute loops, then use sieve and its ilk.  If what you care
about is performance under actual application conditions, you must
use benchmarks that more accurately reproduce those types of environments.

-- 
			***dan

decwrl!mips!djl                  mips!djl@decwrl.dec.com

galen@oucs.UUCP (04/14/87)

In my response to article <930@intsc.UUCP> (Tom Kohrs @fae) I wrote
(<532@pdp.cs.OHIOU.EDU>)
> 
> It also seems to me that IBM ** OWNS ** the majority of the Intel stock???

My apologies to Steve McReady (sorry if I spelled it wrong).  My meaning
was that I heard a RUMOR somewhere, somewhen several years ago, to this
effect.  The figures I heard were aprox. 50-60 percent.  Steve McReady
says is is aprox. 7 percent now.  Anyways, my main point was...

	WHAT DOES THIS HAVE TO DO (realistically) WITH THE M68K???

If we are going to have articles like this on the net, let's put them
someplace else.  (Also... Please note the FLAME SELF at the end of the
article!!!)  Maybe (heaven forbid! we have enough to keep track of and 
store!!!) we should create a "comp.sys.religious_wars" or something
(maybe place them under comp.sys.misc????).

Again, I apologize (also for my spelling...)

- (a 68xxx biased, veryily)  douglas wade needham

-- 
----S----N----A----R----K----S----&----B----O----O----J----U----M----S----
Douglas Wade Needham     (614)593-1567 (work) or (614)597-5969 (Home)
Electrical Engineering Dept., Ohio University, Athens, Ohio 45701 
UUCP: ...!cbatt!oucs!galen ** Smart Mailers: galen@pdp.cs.OHIOU.EDU

tim@ism780c.UUCP (Tim Smith) (04/14/87)

In article <7872@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
<
< As I recall, the MMU in the 386 really does have at least one 
< serious architectural problem that makes high-performance
< multiprogramming distinctly difficult.

Could you supply more information, please?
-- 
Tim Smith			"Hojotoho! Hojotoho!
uucp: sdcrdcf!ism780c!tim	 Heiaha! Heiaha!
Delph or GEnie: Mnementh	 Hojotoho! Heiaha!"
Compuserve: 72257,3706

caf@omen.UUCP (04/15/87)

In article <285@winchester.mips.UUCP> djl@mips.UUCP (Dan Levin) writes:
:If what you care about is performance of a processor on very small
:integer compute loops, then use sieve and its ilk.  If what you care
:about is performance under actual application conditions, you must
:use benchmarks that more accurately reproduce those types of environments.

I thought many programs spend time looping in fairly localized loops,
especially on machines that lack high powered string instructions that
are useful to C.  What are "for" and "while" statements for?  Note that
you can have a 50 line for loop that still gets good cache hits if most
of the code is usually not executed.

So let's ask: how does the performance improvement provided by a small
chache on sieve compare with the performance improvement with ditroff
and tpscript for example?

While it is conceivable that Motorola put the 256 byte cache in the 68020
just to help certain benchmarks,  it is more likely that the cache
actually improves performance rather inexpensively.

Does somebody have real data on how the 68020's cache improves performance
on sieve, troff, sort, and other Unix CPU hogs?

root@sbcs.UUCP (04/15/87)

> > While I am on the subject, did anyone else catch Scott McNealys (Sun Micr.
> > Prez) regarding the '386 - they went something along the lines (rough
> > quote): "no high performance Unix exists for the chip - therefore it
> > has architectural problems".  I suppose the 68020 had the same archtectural
> > problems when it first came out, but they just "got better"...
> 
> Despite the stigma of being a company president, the man may know what he's
> talking about.  As I recall, the MMU in the 386 really does have at least
> one serious architectural problem that makes high-performance multiprogramming
> distinctly difficult.  The 68020 does not share said botch.  The way you've
> quoted it reverses cause and effect, but that could be a misquote or a
> misunderstanding.
> -- 
> "We must choose: the stars or	Henry Spencer @ U of Toronto Zoology
> the dust.  Which shall it be?"	{allegra,ihnp4,decvax,pyramid}!utzoo!henry

Well Henry, perhaps MIS Week misquoted McNealy - here is the exact quote
from pg 8, MIS Week for March 2, 1987:

	"No one has done a high performance version of Unix on a 386,
	 so there is a lot wrong with the 386."

Don't think I reversed cause and effect in my original posting..  BTW, in my
readings of the 386 architectural manual + data sheets I didn't find anything
horrendously wrong with the 386 MMU (i.e. no worse than VAX/68851/National MMU's
:-).  Perhaps you would care to elaborate on the problem?

							Rick Spanbauer
							SUNY/Stony Brook

lamaster@pioneer.UUCP (04/15/87)

In article <518@omen.UUCP> caf@.UUCP (PUT YOUR NAME HERE) writes:
>In article <285@winchester.mips.UUCP> djl@mips.UUCP (Dan Levin) writes:
>:If what you care about is performance of a processor on very small
>:integer compute loops, then use sieve and its ilk.  If what you care
>:about is performance under actual application conditions, you must
>:use benchmarks that more accurately reproduce those types of environments.
>
>I thought many programs spend time looping in fairly localized loops,
>especially on machines that lack high powered string instructions that
>are useful to C.  What are "for" and "while" statements for?  Note that
>
>While it is conceivable that Motorola put the 256 byte cache in the 68020
>just to help certain benchmarks,  it is more likely that the cache
>actually improves performance rather inexpensively.

I agree.  Two points:

1)  To see whether a small cache is really going to help performance, you need
to look at the memory reference pattern of generated code carefully.  I assume
that motorola did this.  A problem with small caches that are shared between
code and data is that data references can "take over" the cache and force
unnecessary memory references for instruction fetches, holding up instruction
issue on a pipelined machine.  A separate instruction cache is usually
indicated in my opinion, with the additional benefit of doubling the "cache
bandwidth" without complicated logic.   Seymour Cray/Control Data/ETA/Neil
Lincoln, etc. lineage of machines have always had small (e.g. in the range of
32-256 instructions) instruction caches, and NO data caches (but sometimes
hundreds of registers).  These semi-risc load/store pipelined (including
memory references) machines demonstrated very good performance on a wide
variety of code; they did especially well on number crunching when code
segments that fit in cache (the instruction "stack" on older machines - almost
a cache ...) demonstrated a scalar speed up of a factor of two or more, often,
on the many loops that fit in cache.  A small cache can do a great deal of
good on a pipelined machine if the net effect is to speed up instruction
issue.  

The effect of data caches is much less pronounced or predictable.  I believe
that a majority of engineering/scientific codes and also system code have
widely scattered memory reference patterns.  So, I am much more sceptical of
the effect of data caches on real world problems.  However, a small data cache
is sometimes a useful substitute for registers (enter RISC debate). Data
caches do have an exaggerated effect on many of the popular small benchmarks
like Dhrystone, unfortunately.  The writers of these benchmarks usually take
the easy way out because it is hard to duplicate average code in a concise
benchmark.  And systems with data caches sometimes appear faster than they are
in the real world. There are Engineering/Scientific benchamarks that do not
have so much of this problem (e.g. the Dongarra Linpack benchmark, with the
dimensions appropriately scaled).

There does seem to be a place out there for a better scalar/system code
benchmark.  I think a tree sort of a large random array is probably a better
overall benchmark (of the inherent CPU speed) than Dhrystone, but does not
provide as good coverage of the types of code that are usually encountered.
Byte addressing, extraction of fields from records, string searches, etc. are
usually considered desirable in a benchmark to ferret out architectural or
compiler weaknesses, but the correct way to test performance on these types of
codes is a matter of debate :-)


  Hugh LaMaster, m/s 233-9,  UUCP {seismo,topaz,lll-crg,ucbvax}!
  NASA Ames Research Center                ames!pioneer!lamaster
  Moffett Field, CA 94035    ARPA lamaster@ames-pioneer.arpa
  Phone:  (415)694-6117      ARPA lamaster@pioneer.arc.nasa.gov

"In order to promise genuine progress, the acronym RISC should stand 
for REGULAR (not reduced) instruction set computer." - Wirth

("Any opinions expressed herein are solely the responsibility of the
author and do not represent the opinions of NASA or the U.S. Government")

rod@cpocd2.UUCP (04/16/87)

In article <532@pdp.cs.OHIOU.EDU> galen@pdp.cs.OHIOU.EDU (Douglas Wade Needham) writes:
>It also seems to me that IBM ** OWNS ** the majority of the Intel stock???
>

IBM does not own a majority of Intel stock.  The last I heard it's no more
than 20% - I think it's 15%.  Obviously, IBM is a major customer, but by
no means are business decisions dictated by IBM - Andy G. still has
the final word.
-- 

	Rod Rebello
	...!intelca!mipos3!cpocd2!rod

tomk@intsc.UUCP (Tom Kohrs @fae) (04/17/87)

> In article <1517@ncr-sd.SanDiego.NCR.COM>  Michael Lodman (mike.lodman@SanDiego.NCR.COM) writes:
> In article <930@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:

> >I will be the first one to admit that the IBM PC and its derivatives are a 
> >kludge.  But don't blame the 386 for IBM's incompetence.
> 
> Will you also admit that the 8086 and its derivatives are a kludge?
> And I do blame Intel.
> 
Not at all!  The 8086 architecture was a real nice way of extending the 
address range back in the days when the dominate machine was the 8085 and
Z-80.  Swapping segment registers to change address was so much nicer than
doing bank selections or overlays.  The 286 architecture is considerably
nicer than what was the dominate 16bit mini at the time, the PDP-11.  In
1980 the idea that you could get the cpu power of an 11/70 on a chip with
a cleaner memory management model got a lot of people excited.  

What caused most of the frustration toward the 286 was DEC and Motorola 
both went to a 32bit programming model at that time. Programmers quickly 
jumped to arms to adhere to the old maxim of using all of the available 
memory plus one byte. When these neat new programs (ie BSD 4.x) were 
forced back down to the 16 bit architecture things got tricky.  Many 
programmers decided that programming in a 32 bit environment required
less effort and less need for structure than the 16 bit environment and
so to justify their not liking to work on 16 bit machines they were labled
as being kludges or obsolete.

The only thing I could fault Intel for is possibly not going to a 32 bit
architecture sooner, but we were too busy building 80186's (5-6 million
sold so far).  Also we were learning about how to build an MMU (the hard
part) without having to debug 32 bit ALU's at the same time.  The design 
decision that was made 9+ years ago was do we build a slow 32bit machine
or a fast 16 bitter. Intel decided on the fast 16bit, Motorola went for
the slow 32 bit.  The rest is history.

> I don't mean to imply that Intel has never had good microprocessors. The
> 8080A was a fine machine. But from that point on, Intel seemed to lose
> its edge, first to Zilog and the Z80, and then to Motorola and the 68000.
>
> I hope that the '386 is a step in the right direction for Intel...

I hope so too, now that I am a stock holder.

tomk@intsc.UUCP (Tom Kohrs @fae) (04/17/87)

> > While I am on the subject, did anyone else catch Scott McNealys (Sun Micr.
> > Prez) regarding the '386 - they went something along the lines (rough
> > quote): "no high performance Unix exists for the chip - therefore it
> > has architectural problems".  I suppose the 68020 had the same archtectural
> > problems when it first came out, but they just "got better"...
> 
> Despite the stigma of being a company president, the man may know what he's
> talking about.  As I recall, the MMU in the 386 really does have at least
> one serious architectural problem that makes high-performance multiprogramming
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
For my edification and to prevent false innuendos please expound.

> distinctly difficult.  The 68020 does not share said botch.  
> -- 
> "We must choose: the stars or	Henry Spencer @ U of Toronto Zoology
> the dust.  Which shall it be?"	{allegra,ihnp4,decvax,pyramid}!utzoo!henry

don@gitpyr.gatech.EDU (Don Deal) (04/17/87)

In article <932@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:
>What caused most of the frustration toward the 286 was DEC and Motorola 
>both went to a 32bit programming model at that time. Programmers quickly 
>jumped to arms to adhere to the old maxim of using all of the available 
>memory plus one byte. When these neat new programs (ie BSD 4.x) were 
>forced back down to the 16 bit architecture things got tricky.  Many 
>programmers decided that programming in a 32 bit environment required
>less effort and less need for structure than the 16 bit environment and
>so to justify their not liking to work on 16 bit machines they were labled
>as being kludges or obsolete.

  Oh I get it.  It was lazy programmer's who resulted in negative comments
being made about the 286.  Thanks for clearing that up.  Next time I reload
a segment register to address more than 64k, I'll remember that I'm doing
structured programming.  Gee, come to think of it, all of the 8-bit 
microprocessors I used could only address 64k. Maybe i've been a structured
programmer all along!

>The only thing I could fault Intel for is possibly not going to a 32 bit
>architecture sooner, but we were too busy building 80186's (5-6 million
>sold so far).  Also we were learning about how to build an MMU (the hard
>part) without having to debug 32 bit ALU's at the same time.  The design 
>decision that was made 9+ years ago was do we build a slow 32bit machine
>or a fast 16 bitter. Intel decided on the fast 16bit, Motorola went for
>the slow 32 bit.  The rest is history.

  Oh, use your imagination.  If you don't have one, here are a few of my
favorite gripes:

   - using single-byte opcodes for infrequently used instructions 

     AAA, AAS, DAA, DAS, LAHF, SAHF - 1 byte

   - using longer code sequences for commonly used operations 

     PUSH'ing more than two registers.  Having a register mask for
     push and pop would have been nice.  Using an eight bit mask would
     have allowed the most commonly-used registers to be pushed in
     one instruction - a sixteen-bit mask would take care of all of
     them.

     Not being able to move immediate data into a segment register.  How
     often have you seen:

       MOV  AX,data
       MOV  ES,AX

     This makes >64k addressing just *so* much more painful.

   - Selling silicon with bad bugs in it.  The 286 had problems with the POPF
     instruction and with one or more instructions in protected mode. So
     many broken parts were sold, that it was essentially impossible for
     software developers to make use of these instructions (assuming they
     knew about the problems).  Hearing word of a recall program for 386
     makes me wonder how much of an improvement the 386 really is.

   - 8080 compatibility.  In this day and age, it's no longer an advantage;
     it's a liability.
-- 
D.L. Deal, Office of Computing Services, Georgia Tech, Atlanta GA, 30332-0275
Phone: (404) 894-4660   ARPA: don@pyr.ocs.gatech.edu  BITNET: cc100dd@gitvm1
uucp: ...!{akgua,allegra,amd,hplabs,ihnp4,masscomp,ut-ngp}!gatech!gitpyr!don

uhclem@trsvax.UUCP (04/17/87)

"Not another one."

All of this benchmarking-for-comparison may be pointless, particulary if you
happen to own a B0 or B1 386 (B1==current production, or so we are told
by our sales rep).

Check your benchmarks, and if they are doing any 32-bit multiplication,
all bets are off.  There is a problem with some 386 chips and it causes
the 32-bit multiply to yield incorrect results.  Why didn't you guys at
Intel say anything about this in your responses?

What good is a timing benchmark when the program does not yield valid results?
Perhaps the benchmark should be weighted by accuracy too.  :-)

However, you may be lucky and have a 386 that does not goof up, or
your benchmark might not use that instruction. 

Intel's public statement states that they will not discuss the yields on the
386, with regard to this problem.  The statement also said a new mask will be
in production in 2-3 months.

As for me, it's tough to program MMU's or paging systems, when you can't
trust the numbers the system generates.   You tend to trap a lot.

<The above is sort of my own opinion.>

						"Thank you, Uh Clem."
						Frank Durda IV
						@ <trsvax!uhclem>

"Now, don't have too many bits turned on in your numbers or the barrel-
 shifter might get hot."

mhorne@tekfdi.UUCP (04/19/87)

[]

I am amazed at the number of folks from Intel Sales that subscribe to this
newsgroup!  Could it be they are afraid the 386 just won't make the grade?

There sure has been a lot of BS flying around here from certain individuals
working for a certain company...

						- MTH
						(my opinions, not Tek's)

-- 
---------------------------------------------------------------------------
Michael Horne - KA7AXD                  UUCP: tektronix!tekfdi!honda!mhorne
FDI group, Tektronix, Incorporated      ARPA: mhorne@honda.fdi.tek.com
Day: (503) 627-1666                     HAMNET: ka7axd@k7ifg

doug@edge.UUCP (Doug Pardee) (04/20/87)

> The design 
> decision that was made 9+ years ago was do we build a slow 32bit machine
> or a fast 16 bitter. Intel decided on the fast 16bit, Motorola went for
> the slow 32 bit.

I seem to recall Intel did design a slow (*very* slow) 32-bitter, the
iAPX432.  I would suggest that it was not an Intel decision to concentrate
on 16-bit CPUs, but rather the scandalous failure of the '432 that led to
Intel being thought of as a "manufacturer of 16-bit CPUs".

-- Doug Pardee -- Edge Computer Corp. -- Scottsdale, Arizona

geoff@desint.UUCP (Geoff Kuenning) (04/21/87)

In article <932@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:

> Not at all!  The 8086 architecture was a real nice way of extending the 
> address range back in the days when the dominate machine was the 8085 and
> Z-80.  Swapping segment registers to change address was so much nicer than
> doing bank selections or overlays.  The 286 architecture is considerably
> nicer than what was the dominate 16bit mini at the time, the PDP-11.  In
> 1980 the idea that you could get the cpu power of an 11/70 on a chip with
> a cleaner memory management model got a lot of people excited.  

What a blatant misrepresentation:

    (1) The PDP-11 is a 1960's architecture.  The 8086 was designed in the
	late 70's, *after* the publication of Gordon Bell's book where he
	states, "the most embarassing thing about the PDP-11 was that, only
	two years after its introduction, we found it necessary to widen
	the processor address."  But did Intel learn from DEC's mistake?
    (2) The *only* place where the PDP-11 loses to the 8086 is in the 8086's
	clever use of instruction prefixes to provide bank switching (that's
	all segments are).  The PDP-11 doesn't have any registers with
	wierd private characteristics.  By contrast, even on the 80386, if
	you want to shift by a variable shift count, it *has* to be in a
	particular register.  How flexible.
    (3) Although it is true that the 808x series was the dominant
	MICROPROCESSOR CHIP (not the dominant machine, by any stretch of
	the imagination), it is also true that there were *lots* of better
	architecture examples around.  The 6800 literally leaps to mind.
    (4) Tom also conveniently ignores what a loser the 8085 was.  Essentially
	no changes from the 8080.  Check out the 6809 by contrast, or compare
	the PDP-11/70 with the original 11, the 11/20.
    (5) The basic reason for the 8086's kludgey architecture was not because
	it was a better design than the 11/70.  It was because Intel saw
	binary compatibility with the 8080/8085 as a critical goal.  That's
	forgivable;  nobody at the time could have seen that MS/DOS was
	going to kill CP/M-86.  But contrast DEC's approach with the VAX.
	They got compatibility with no kludges, and their customers don't
	seem too unhappy.

> What caused most of the frustration toward the 286 was DEC and Motorola 
> both went to a 32bit programming model at that time. Programmers quickly 
> jumped to arms to adhere to the old maxim of using all of the available 
> memory plus one byte. When these neat new programs (ie BSD 4.x) were 
> forced back down to the 16 bit architecture things got tricky.  Many 
> programmers decided that programming in a 32 bit environment required
> less effort and less need for structure than the 16 bit environment and
> so to justify their not liking to work on 16 bit machines they were labled
> as being kludges or obsolete.

Again a blatant misrepresentation, if not an out-and-out lie.  Having just
finished mentioning the difficulty of doing overlays on the PDP-11, Tom
now tries to sell us on the idea that it's lazy programming that creates
such big code.  And he conveniently ignores the need of many programs
for large data spaces.  I did a PDP-11 application once that called for
"arbitrarily complex" databases (within the limitations of disk space)
in the spec.  We did it, but the 64K restriction was a bloody *pain*.
I guess it's just "lazy programming" that kept me from doing arbitrary
complexity within 64K...

> The only thing I could fault Intel for is possibly not going to a 32 bit
> architecture sooner, but we were too busy building 80186's (5-6 million
> sold so far).  Also we were learning about how to build an MMU (the hard
> part) without having to debug 32 bit ALU's at the same time.

Funny, I don't recall Intel introducing an MMU before the 80386 (certainly
the garbage in the 286 doesn't qualify as an MMU, not given what competitors
were doing).  Guess the learning curve was pretty steep.  And 32-bit ALU's
are not much of an excuse;  there was something of a surfeit of examples,
and in any case widening an ALU is not what most people consider a difficult
problem.  (On the other hand, to be fair, the '386 MMU is a big winner.)
As to number of parts sold, the IBM 650 was pretty successful in its day,
too, but I wouldn't defend it as a modern architecture.

> The design 
> decision that was made 9+ years ago was do we build a slow 32bit machine
> or a fast 16 bitter. Intel decided on the fast 16bit, Motorola went for
> the slow 32 bit.  The rest is history.

One of Intel's favorite misrepresentations.  It handily ignores the
fact that the 68k is just as fast in "small model" and that it is *much*
faster in "large model" or even in any application where you have to store
numbers bigger that 64K (accounting?  what's that?).  And it also completely
ignores the 6809, which was pretty successful in its own right.

Back to used cars, Tom.  At least then your customers won't see through you.
-- 
	Geoff Kuenning   geoff@ITcorp.com   {hplabs,ihnp4}!trwrb!desint!geoff

tim@ism780c.UUCP (Tim Smith) (04/21/87)

In article <932@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:
<
< doing bank selections or overlays.  The 286 architecture is considerably
< nicer than what was the dominate 16bit mini at the time, the PDP-11.  In
< 1980 the idea that you could get the cpu power of an 11/70 on a chip with
< a cleaner memory management model got a lot of people excited.  

So how come you guys put the index in the upper 13 bits of a selector and
the GDT/LDT bit and the priveledge bits in the low 3 bits?  If you had put
the index in the low 13 bits, then the LDT/GDT bit, and then the rest,
one could have gotten a 29 bit linear address space by setting up
non-overlapping 64k segments.

This would have eliminated the major complaint that people who program
in higher level languages have against the 286.

The only reason I can think of is that it saves having to shift the
index to get the offset into the GDT/LDT, but my friends who design
CPUs and do VLSI, and all that rot, tell me that this is no problem.
-- 
Tim Smith			"Hojotoho! Hojotoho!
uucp: sdcrdcf!ism780c!tim	 Heiaha! Heiaha!
Delph or GEnie: Mnementh	 Hojotoho! Heiaha!"
Compuserve: 72257,3706

phil@amdcad.AMD.COM (Phil Ngai) (04/22/87)

In article <652@desint.UUCP> geoff@desint.UUCP (Geoff Kuenning) writes:
.numbers bigger that 64K (accounting?  what's that?).  And it also completely
.ignores the 6809, which was pretty successful in its own right.

How did sales for the 6809 compare with the 808X family? Anyone know?
-- 
Phil Ngai, {ucbvax,decwrl,allegra}!amdcad!phil or amdcad!phil@decwrl.dec.com

hugo@gnome.cs.cmu.edu.UUCP (04/22/87)

In article <16294@amdcad.AMD.COM> phil@amdcad.UUCP (Phil Ngai) writes:
>
>How did sales for the 6809 compare with the 808X family? Anyone know?
>-- 

Well, I think the 6809 sold pretty well if only because Tandy used in in
their color computer line of little boxes.

So, I would say it was pretty successful, but not so big as the 808X,
unfortunately.

Pete
--
ARPA: hugo@cmu-cs-gandalf.arpa      BELL:412-681-7431
UUCP: ...!{ucbvax,ihnp4,cmucspt}!hugo@cmu-cs-gandalf.arpa
USPS: 5170 Beeler St., Pittsburgh PA 15213
QUOT: "What's that I smell? I smell home cooking.  It's only the river!"
			_ Talking Heads



-- 
ARPA: hugo@cmu-cs-gandalf.arpa      BELL:412-681-7431
UUCP: ...!{ucbvax,ihnp4,cmucspt}!hugo@cmu-cs-gandalf.arpa
USPS: 5170 Beeler St., Pittsburgh PA 15213
QUOT: "What's that I smell? I smell home cooking.  It's only the river!"
			_ Talking Heads

toma@tekgvs.UUCP (04/23/87)

In article <652@desint.UUCP< geoff@desint.UUCP (Geoff Kuenning) writes:
<In article <932@intsc.UUCP< tomk@intsc.UUCP (Tom Kohrs @fae) writes:
<
<< Not at all!  The 8086 architecture was a real nice way of extending the 
<< address range back in the days when the dominate machine was the 8085 and
<< Z-80.  Swapping segment registers to change address was so much nicer than
<< doing bank selections or overlays.  The 286 architecture is considerably
<< nicer than what was the dominate 16bit mini at the time, the PDP-11.  In
<< 1980 the idea that you could get the cpu power of an 11/70 on a chip with
<< a cleaner memory management model got a lot of people excited.  
<
<What a blatant misrepresentation:
<
<    (1) The PDP-11 is a 1960's architecture.  The 8086 was designed in the
<	late 70's, *after* the publication of Gordon Bell's book where he
<	states, "the most embarassing thing about the PDP-11 was that, only
<	two years after its introduction, we found it necessary to widen
<	the processor address."  But did Intel learn from DEC's mistake?

The PDP-11 came out in 1971 (which makes it a 60's architecture like the
PDP-10 and IBM 360), and was (and probably still is) the 
dominate minicomputer by 1980.  The initial PDP-11 had a 64k address space.
The initial 80x86 had a 1meg address space.  

<    (2) The *only* place where the PDP-11 loses to the 8086 is in the 8086's
<	clever use of instruction prefixes to provide bank switching (that's
<	all segments are).  The PDP-11 doesn't have any registers with
<	wierd private characteristics.  By contrast, even on the 80386, if
<	you want to shift by a variable shift count, it *has* to be in a
<	particular register.  How flexible.

Integer Divides and multiplies (on models that had them) had to use even/odd
register pairs.  MMU registers were memory mapped (which I would call "wierd".
Early PDP-11s could not shift more than one bit per instruction.  The 80x86
instruction set has more serious register limitiations than the variable
shift count (which is rarely used).  Even worse are its dedicated use of SI,
DI, and AX for the string instructions.

Segments can be used for memory protection/management and provide a convenient
way to pass data between processes on 80286 or 80386 processors.  The
unfortunate feature of segments is that the have no use in a UNIX environment.

The PDP-11 looses to the 8086 in price/performance.  It also suffers from
several non-compatible hardware floating point units and two types of hardware
integer multiply divide units.

<    (3) Although it is true that the 808x series was the dominant
<	MICROPROCESSOR CHIP (not the dominant machine, by any stretch of
<	the imagination), it is also true that there were *lots* of better
<	architecture examples around.  The 6800 literally leaps to mind.

The 6800 was better for numeric calculation, but not as good for moving data
around.  But the 6502 and the 6809 were better yet!  Unfortunately look at
the sales figures.

<    (4) Tom also conveniently ignores what a loser the 8085 was.  Essentially
<	no changes from the 8080.  Check out the 6809 by contrast, or compare
<	the PDP-11/70 with the original 11, the 11/20.

Sure the instruction set was no better (go to a Z-80 for that), but the 8085
did reduce the system chip count -- it was mainly better from a hardware
point of view.


<    (5) The basic reason for the 8086's kludgey architecture was [...]
<	because Intel saw binary compatibility with the 8080/8085 as a 
<       critical goal.  That's
<	forgivable;  nobody at the time could have seen that MS/DOS was
<	going to kill CP/M-86.  But contrast DEC's approach with the VAX.
<	They got compatibility with no kludges, and their customers don't
<	seem too unhappy.

What binary compatibility??  It was sort of compatible from source, if you
ran it through a conversion program.  The result was sluggishly running
8086 programs (such as WordStar which on a good Z80 machine would run rings
around the 8086 version on an IBM-PC).  MS-DOS was a closer CP/M-80 clone
than CP/M-86 since only the former supported CP/M 80 style system calls
(that is what CALL 5 is for).  DEC's approach with the VAX matches that of
NEC's V20/30 chip which does maintain compatibility, and I only wish came
out five years sooner (and emulated Z80 as well).

<
<< What caused most of the frustration toward the 286 was DEC and Motorola 
<< both went to a 32bit programming model at that time. Programmers quickly 
<< jumped to arms to adhere to the old maxim of using all of the available 
<< memory plus one byte. When these neat new programs (ie BSD 4.x) were 
<< forced back down to the 16 bit architecture things got tricky.  Many 
<< programmers decided that programming in a 32 bit environment required
<< less effort and less need for structure than the 16 bit environment and
<< so to justify their not liking to work on 16 bit machines they were labled
<< as being kludges or obsolete.
<
<[...]I guess it's just "lazy programming" that kept me from doing arbitrary
<complexity within 64K...

OK, two examples of lazy C programming are 1). the assumption that 
sizeof(int)==sizeof(char *), 2)Using int instead of long for numbers that
need to be longer than 16 bits.


Tom Almy
Tektronix, Inc.

(If I had a choice, I would still be doing assembly languange programming
on a PDP-11, with the best intruction set of all time, rather than a 
680x0 or 80x86, both of which have more kludges than I would have time
to talk about.)

doug@edge.UUCP (Doug Pardee) (04/23/87)

Picking nits:

> 	The PDP-11 doesn't have any registers with
> 	wierd private characteristics.

I'm no expert on the 11, but aren't registers 6 and 7 the Program Counter and
Stack Pointer?  Not that this voids the argument -- the '86s are much more
restrictive with their weird registers.

> 	... Intel saw
> 	binary compatibility with the 8080/8085 as a critical goal.  That's
> 	forgivable;  nobody at the time could have seen that MS/DOS was
> 	going to kill CP/M-86.

Actually, the 8080 compatibility was *more* crucial for MS/DOS than for
CP/M-86.  MS/DOS was designed from the outset to be 100% upward compatible
with good-ol' CP/M, so that CP/M programs could be mechanically translated
from 8080 code to 8086 code and they'd run.  (It worked, too.  Many of the
early PC programs were mechanically translated CP/M programs.)  For some
reason, DRI didn't consider compatibility to be important, and CP/M-86
wasn't upward compatible.

-- Doug Pardee -- Edge Computer Corp. -- Scottsdale, Arizona

ihm@nrcvax.UUCP (Ian H. Merritt) (04/24/87)

>In article <16294@amdcad.AMD.COM> phil@amdcad.UUCP (Phil Ngai) writes:
>>
>>How did sales for the 6809 compare with the 808X family? Anyone know?
>>-- 
>
>Well, I think the 6809 sold pretty well if only because Tandy used in in
>their color computer line of little boxes.
>
>So, I would say it was pretty successful, but not so big as the 808X,
>unfortunately.
>

Unfortunately indeed.  The 6809 was just a bit too late for its
market.  Had it come out 2 years earlier, MANY things would have been
different.  It was VASTly superior to the 8080/Z80 which dominated the
8-bit market at the time.  Architecturally, it is more consistent and
generally cleaner than the x86 garbage, but limited by its 64K address
limitations.  The x86 at least expanded that slightly, albeit by a
kludgy and almost unuable mechanism.  This made a difference though in
the choice of a CPU for the MS-DOG machines. (Woof!).

Pity things have taken the low road, but I perceive a trend toward a
somewhat improved future.  Maybe I am just dreaming.

Here's to a future without Intel (or at least their current
philosophy)...

Cheerz--
					<>IHM<>

chapman@fornax.uucp (John Chapman) (04/25/87)

>
[ comparison of 11's and 80x8y's ]
> 
> Integer Divides and multiplies (on models that had them) had to use even/odd
> register pairs.  MMU registers were memory mapped (which I would call "wierd".
> Early PDP-11s could not shift more than one bit per instruction.  The 80x86
> instruction set has more serious register limitiations than the variable
> shift count (which is rarely used).  Even worse are its dedicated use of SI,
> DI, and AX for the string instructions.

This is not peculiar to Intel as NSC 32xxx and Dec VAX machines do it 
(probably a lot of others) as well. Each dedicates particular registers
for specific (count, dst., src.) functions in string instructions. It
is the only way to have single instruction string operations that are
interruptible and resumable (which you obviously want) other than
perhaps putting the internal (non user visible) microcode registers
on the stack *every* time an interrupt happens (*yuck*).

.
.
.
 
> Tom Almy
> Tektronix, Inc.
> 
> (If I had a choice, I would still be doing assembly languange programming
> on a PDP-11, with the best intruction set of all time, rather than a 
> 680x0 or 80x86, both of which have more kludges than I would have time
> to talk about.)
Yup.

john chapman

*** REPLACE THIS LINE WITH YOUR MESSAGE ***
-- 
{watmath,seismo,uw-beaver}!ubc-vision!fornax!sfulccr!chapman
                   or  ...!ubc-vision!sfucmpt!chapman

almquisk@rpics3b.UUCP (04/26/87)

> What caused most of the frustration toward the 286 was DEC and Motorola 
> both went to a 32bit programming model at that time. Programmers quickly 
> jumped to arms to adhere to the old maxim of using all of the available 
> memory plus one byte. When these neat new programs (ie BSD 4.x) were 
> forced back down to the 16 bit architecture things got tricky.  Many 
> programmers decided that programming in a 32 bit environment required
> less effort and less need for structure than the 16 bit environment and
> so to justify their not liking to work on 16 bit machines they were labled
> as being kludges or obsolete.

Did Intel simply fail to notice what was happenning to the price of memory?
16 bit machines are fine if you can't afford more than 64K of memory anyway.
The VAX and the 68000 made it possible to take full advantage of the cheaper
memory, but these days DEC sells memory for the PDP-11 in 1 Mbyte chunks.

It is true that 32 bit environments can save on programming effort, which is
very important these days since programming costs tend to exceed hardware
costs.  But that is only half the story.  I own an old 68000 box, and the
editor I use simply reads files being edited into space obtained by malloc.
If I modified the editor to keep the files being edited on disk, the result
would be *slow* because the disk has an 85 millisecond average access time.  
In the early 70's it was necessary to either shell out the money for fast
disk drives or else live with slow editors, but today it is possible to buy
lots of cheap RAM to speed up editing--if your CPU was designed to support
it.

> The 286 architecture is considerably nicer than what was the dominate
> 16bit mini at the time, the PDP-11.

But of course when the 286 came out the dominant mini of the time was the
32 bit VAX.  From the point of view of a person accustomed to working with
larger computers, the only really interesting chip that Intel has come out
with is the 432.  The problem with the 432 is that capability based systems
are still not very well understood.  It might be possible to build a nice
system based upon the 432, but today the idea is to build UN*X boxes and
you don't need a 432 to do that.
					Kenneth Almquist

geoff@desint.UUCP (Geoff Kuenning) (04/28/87)

In article <678@edge.UUCP> doug@edge.UUCP (Doug Pardee) writes:

> Picking nits:
> 
> > 	The PDP-11 doesn't have any registers with
> > 	wierd private characteristics.
> 
> I'm no expert on the 11, but aren't registers 6 and 7 the Program Counter and
> Stack Pointer?  Not that this voids the argument -- the '86s are much more
> restrictive with their weird registers.

Well, technically that's true.  The PDP-11 PC has the "special characteristic"
that it autoincrements after an opcode fetch (not, however, after an
operand-address or immediate-operand fetch -- see below).  The SP has two
special characteristics:  the JSR/RTS instructions use it as in implied
register, and the interrupt/RTI operations do the same.

As to the PC, DEC has a patent on "PC as a general register".  By making
the PC just another general register, you lose one register, but gain
a whole bunch in instruction-encoding simplicity.  For example, an immediate
operand is handled by just encoding the operand as (PC)+ -- the operand
is plucked from where the PC points, and the PC increments over it to
the next instruction stream element, nice as you please.  I predict that
when DEC's patent expires, you will see this feature in a lot of other
computers.

However, it is worth noting that *every* computer has "special registers"
of various sorts.  Somebody at Tek mentioned MMU registers, which I consider
a red herring.  There's also the PC and the PSW.  My point was the Intel's
GENERAL registers have special characteristics, and thus aren't really
general.
-- 
	Geoff Kuenning   geoff@ITcorp.com   {hplabs,ihnp4}!trwrb!desint!geoff

ihm@nrcvax.UUCP (Ian H. Merritt) (04/28/87)

>>
>[ comparison of 11's and 80x8y's ]
>> 
>> Integer Divides and multiplies (on models that had them) had to use even/odd
>> register pairs.  MMU registers were memory mapped (which I would call "wierd".
>> Early PDP-11s could not shift more than one bit per instruction.  The 80x86
>> instruction set has more serious register limitiations than the variable
>> shift count (which is rarely used).  Even worse are its dedicated use of SI,
>> DI, and AX for the string instructions.
>
>This is not peculiar to Intel as NSC 32xxx and Dec VAX machines do it 
>(probably a lot of others) as well. Each dedicates particular registers
>for specific (count, dst., src.) functions in string instructions. It
>is the only way to have single instruction string operations that are
	^^^^
>interruptible and resumable (which you obviously want) other than
>perhaps putting the internal (non user visible) microcode registers
>on the stack *every* time an interrupt happens (*yuck*).

The ONLY way?  Really?  First off, it is not clearly desirable to have
single instruction dedicated string operations, particularly if the
processor can execute a short loop just as fast (or faster), the
latter providing far greater flexibility.  If for some reason, you
must have a single instruction, though, it is perfectly reasonable to
have the instruction SPECIFY the source, destination, and count
registers (or whatever operands the instruction requires), thereby
allowing ANY GENERAL PURPOSE REGISTER (not something found on Intel
processors) to be used for any operand.

On another note, wrt the PDP-11 multiply/divide stuff, I was
particularly disturbed when they decided to have a mixed-endian model.
Basically, the PDP-11 is a little-endian processor where the low-order
byte of a word is in the lower numbered address, and bits are numbered
as powers of two.  When they had to store 32 bit numbers, for
multiplies and divides, they decided to deviate from this basically
consistent philosophy, storing the high order word in the lower
numbered address; the low word in the higher, but within these words,
the bytes are stored the other way.  Terriffic.  The reasons for this
have oft been discussed, and I do not wish to spark another such
conversation, but it sure was confusing.


Cheerz--
							<>IHM<>

henry@utzoo.UUCP (Henry Spencer) (05/01/87)

> ... BTW, in my
> readings of the 386 architectural manual + data sheets I didn't find anything
> horrendously wrong with the 386 MMU ...
>  Perhaps you would care to elaborate on the problem?

Notice the absence of any way to flush a specific entry out of the TLB, as
opposed to flushing the whole thing.  Not a disaster, but a distinct problem
in certain applications.
-- 
"If you want PL/I, you know       Henry Spencer @ U of Toronto Zoology
where to find it." -- DMR         {allegra,ihnp4,decvax,pyramid}!utzoo!henry

chapman@fornax.uucp (John Chapman) (05/02/87)

.
.
<mild flames ahead>
> >> shift count (which is rarely used).  Even worse are its dedicated use of SI,
> >> DI, and AX for the string instructions.
> >
> >This is not peculiar to Intel as NSC 32xxx and Dec VAX machines do it 
> >(probably a lot of others) as well. Each dedicates particular registers
> >for specific (count, dst., src.) functions in string instructions. It
> >is the only way to have single instruction string operations that are
> 	^^^^
> >interruptible and resumable (which you obviously want) other than
> >perhaps putting the internal (non user visible) microcode registers
> >on the stack *every* time an interrupt happens (*yuck*).
> 
> The ONLY way?  Really?  First off, it is not clearly desirable to have

Yes the only way to have : single instruction, interruptible, and
resumable string instructions without putting extra microcode internal
registers on the stack at interrupts.  Yes it is obviously desirable
for string instructions to be resumable.

If you have to stack extra internal state then:

 1. either you only stack the stuff when a string instruction is
    interrupted so your stack state on interrupt is dependant on
    the instruction being interrupted - this is undesirable, or

 2. you always stack that information whether it's needed or not
    introducing a general overhead to interrupts to accomodate
    the occasional string instruction.

> single instruction dedicated string operations, particularly if the
> processor can execute a short loop just as fast (or faster), the
> latter providing far greater flexibility.  If for some reason, you

1. how would you code a string instruction like FFS (find first set bit)
   to fit in something like the 68010's single instruction looping
   feature/cache?

2. with good code cacheing and instruction pipelining you *might*
   get the same performance with a multi-instruction equivalent
   to the simpler string instructions; cases where single instructions
   execute slower than multi-instruction equivalents are more
   likely to be indicative of other problems, e.g. poor micro-coding
   of the single instruction.

> must have a single instruction, though, it is perfectly reasonable to
> have the instruction SPECIFY the source, destination, and count
> registers (or whatever operands the instruction requires), thereby
> allowing ANY GENERAL PURPOSE REGISTER (not something found on Intel
> processors) to be used for any operand.

But it *isn't* perfectly reasonable.  The guys who design these chips
aren't stupid you know - they do this stuff for a reason.  A good
pipelined cpu will have simultaneous opcode and operand decode.  The
architecture is optimized around a central issue like this; a two
address machine has it's silicon designed to handle one or two operand
decodes efficiently.  So along comes a string instruction with 3 or 4
operands - now what do you do? If you want full operand generality
then the chip has to have real estate dedicated to handle these (few)
exceptional instructions with extra operands.

Why is it necessary to have full generality for these instruction
operands?  You know in advance what you will be using a calculation
result for so do it in the register it needs to be in. You have all
the other registers to do what you want with so why is it a problem
that (for example) the vax uses low numbered registers for string
operands? What exactly does it prevent you from doing?

Also part of the reason I posted my original response was that you
were flaming at Intel fro dedicating certain registers for certain
functions. I have two problems with this:
 1. as originally pointed out this is hardly unique to Intel so
    if you are going to flame about it you better flame everybody
    else who does it too - unless of course you are biased.
 2. most people present this as "well you have to use SI as string
    source register" (implying that that is the only use for SI
    or DI, or BX or BP), rather than "well you can do general arithmetic
    etc. on SI and if you want to do a string instruction you use
    SI as the source operand".
 
.
.
. 
> 
> Cheerz--
> 							<>IHM<>

*** REPLACE THIS LINE WITH YOUR MESSAGE ***
-- 
{watmath,seismo,uw-beaver}!ubc-vision!fornax!sfulccr!chapman
                   or  ...!ubc-vision!sfucmpt!chapman

dillon@CORY.BERKELEY.EDU (Matt Dillon) (05/03/87)

>1. how would you code a string instruction like FFS (find first set bit)
>   to fit in something like the 68010's single instruction looping
>   feature/cache?

	moveq.l	#31,D0
	move.l 	ea,D1
loop	asl.l	#1,D1
	dbcs 	loop

    Notes: You could just as easily specify 7, and use byte operations to 
    find the first set bit in a byte.  The routine above checks bits MSB
    first.  You can, of course, use lsr instead to check bits LSB first.
    (remember to take 32 - the count afterword for longword operations)
    You can add an outer loop to check any size item by using
    (A0)+ for the EA on the MOVE (and setting up A0) when loading the 
    initial data into D1, followed by, say, a BEQ if you also want to check for
    an end of string.  Remember that the outer loop would only execute
    once every 32 cached loops (97% of the instructions are cached) assuming
    you use longword operations.

    And, of course, it would all fit into a 68020's cache.  The DBcc 
    instruction quite versitile.  Even though the 68000/68010's DBcc 
    (not sure about 68020) only supports word decrements on the data 
    register, it is quite simple to support 32 bit counts with the addition
    of an outer loop (another two instructions... a subi.l, and a branch).
    Maximum efficiency would give 65536 inner loops for every 1 outer loop,
    or a .0015 PERCENT loss over a DBcc which used the full 32 bits of the
    data register.

    P.S. I didn't test this code... but it's almost a children's exercise
    anyway.

					-Matt

dillon@CORY.BERKELEY.EDU (Matt Dillon) (05/03/87)

>>1. how would you code a string instruction like FFS (find first set bit)
>>   to fit in something like the 68010's single instruction looping
>>   feature/cache?
>
>	moveq.l	#31,D0
>	move.l 	ea,D1
>loop	asl.l	#1,D1
>	dbcs 	loop

	It also occurs to me that one could use a 256 byte lookup table and
then make the string length arbitrary:

	move.l	string_base,A0
	move.l	table_lookup_base,A1
	clr.l	D0
loop	move.b	(A0)+,D0
	move.b	0(A1+D0.W),D1
	beq	loop
	(D1 contains bit number.  If you want to stop at a \0, simply load
	 table entry 0 with some illegal bit number like $FF)

	Although this does not use a DBcc, and takes about 270 bytes more 
memory, it is quite a bit faster than the DBcc since we are checking bits
a byte at a time.

				-Matt

kds@mipos3.UUCP (Ken Shoemaker ~) (05/04/87)

and, of course, the win of having special registers for certain operations
is that operations with those registers can be made faster than in normal
circumstances.  For example, in the 386 (and 286) there is special hardware
in the bus interface unit that is used in the string move instruction to
effect string transfers at the bus bandwidth.  Trying to do this in a general
instruction loop would be difficult in that it would require looking at a
couple of instructions and decoding them as a group.

Also, with respect to specifying operands, the decoding of the instruction
itself is an important part of the execution of the instruction.  In the
8086 there isn't a way to specify any more than two different operands, and
if you specify two operands, one of them is a register.  Thus, for a string
move instruction, you'd be at a loss to try to specify all the operands
you need explicitly unless you increase the complexity of the instruction
decode.
-- 
The above views are personal.

...and they whisper and they chatter, but it really doesn't matter.

Ken Shoemaker, Microprocessor Design, Intel Corp., Santa Clara, California
uucp: ...{hplabs|decwrl|amdcad|qantel|pur-ee|scgvaxd|oliveb}!intelca!mipos3!kds
csnet/arpanet: kds@mipos3.intel.com

mash@mips.UUCP (John Mashey) (05/05/87)

In article <274@fornax.uucp> chapman@fornax.uucp (John Chapman) writes:
> ...Discussion of dedicated registers for string ops...
>
>Why is it necessary to have full generality for these instruction
>operands?  You know in advance what you will be using a calculation
>result for so do it in the register it needs to be in. You have all
>the other registers to do what you want with so why is it a problem
>that (for example) the vax uses low numbered registers for string
>operands? What exactly does it prevent you from doing?

In general, it's an OK tradeoff.  What it does make harder is the use
of more serious optimizing compilers.  In general, it is much easier
for ones that do good global optimization to compute results into
whatever registers are convenient.  Special cases can be lived with,
but the more there are, the harder it gets.  Compiler writers have
always disliked things like:
	register pairs needed for certain operations
	special registers used by some instructions
	unnecesarily assymetric register sets
Again: not a disaster, and maybe a correct tradeoff, given all of the
other assumptions alrady builtin, but not what one would like.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

herndon@umn-cs.UUCP (05/05/87)

In article <365@winchester.UUCP>, mash@mips.UUCP (John Mashey) writes:
> In article <274@fornax.uucp> chapman@fornax.uucp (John Chapman) writes:
> > ...Discussion of dedicated registers for string ops...
> >
> >Why is it necessary to have full generality for these instruction
> >operands?  You know in advance what you will be using a calculation
> >result for so do it in the register it needs to be in. You have all
> >the other registers to do what you want with so why is it a problem
> >that (for example) the vax uses low numbered registers for string
> >operands? What exactly does it prevent you from doing?
> 
> In general, it's an OK tradeoff.  What it does make harder is the use
> of more serious optimizing compilers.
>     . . . . . . . . . . . . . . . . . . . .  Compiler writers have
> always disliked things like:
> 	register pairs needed for certain operations
> 	special registers used by some instructions
> 	unnecesarily assymetric register sets
> Again: not a disaster, and maybe a correct tradeoff, given all of the
> other assumptions alrady builtin, but not what one would like.

  I'll second John on this.  Having used an early Intermetrics
optimizing compiler for the 8086, I truly respect those people
who had to write the optimizer for it.  In spite of very good
register allocation/instruction optimizations, there were so many
"gotcha's" in the processor that some of them inevitably got
through.  Among them were (from memory):
  1) AX was the only register usable with I/O instructions, and
     many instructions insisted on having one operand in AX,
     e.g., XLAT.
  2) BX was the only register usable as a stack offset
  3) CX always held the count for shift and string operations.
  4) DX got high-order results from the multiply, one of the
     multiply ops had to be in AX, and the low-order result
     was in AX.  (Sometimes this clobbered something.)
  5) Some of the instructions did not permit segment over-ride
     prefixes (still executed, but the instruction didn't work as
     anticipated.)
  6) Not all instructions set the condition codes as might be
     expected.
  While any one of these can easily (maybe) be anticipated, the
effect of so many is to overwhelm the poor optimizer writer.
EVERY SINGLE REGISTER on the 80X86 series has one or more special 
uses.  I readily admit that only one of the problems we found
was not documented in the processor manuals (the POPF instruction
had problems), but having soooooooooo many exceptions made the
advertising concept of "8086 general registers" an oxymoron.

-- 
Robert Herndon				Dept. of Computer Science,
...!ihnp4!umn-cs!herndon		Univ. of Minnesota,
herndon@umn-cs.ARPA			136 Lind Hall, 207 Church St. SE
herndon.umn-cs@csnet-relay.ARPA		Minneapolis, MN  55455

johnw@astroatc.UUCP (John F. Wardale) (05/08/87)

In article <639@mipos3.UUCP> kds@mipos3.UUCP (Ken Shoemaker ~) writes:
	... about specialized string instructions ...
>  For example, in the 386 (and 286) there is special hardware
>in the bus interface unit that is used in the string move instruction to
>effect string transfers at the bus bandwidth.  Trying to do this in a general
>instruction loop would be difficult in that it would require looking at a
>couple of instructions and decoding them as a group.

No, Ken, your looking at this the wrong way.   You are ABSOLUTELY
right that this will be memory (bus) limited.  However, if you can
issue instructions fast enough (by having a fewer, simple
instructions) then a loop can also saturate the memory bus.

(This assumes an I-cache.  If you have to fetch each instruction
on a single memory bus, this will obviously lose!)

Average string length, data caches, MMU's etc. will complicate the
analysis.  Note that a page-fault in the middle of a single instr.
means restarting it, but in the loop case, you just re-issue the
instruction.  This simplification may also help you decrease you
processor's cycle time!

(In case you haven't guessed ... I like RISC)


			John W

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Name:	John F. Wardale
UUCP:	... {seismo | harvard | ihnp4} !uwvax!astroatc!johnw
arpa:   astroatc!johnw@rsch.wisc.edu
snail:	5800 Cottage Gr. Rd. ;;; Madison WI 53716
audio:	608-221-9001 eXt 110

To err is human, to really foul up world news requires the net!

chapman@fornax.UUCP (05/08/87)

> >1. how would you code a string instruction like FFS (find first set bit)
> >   to fit in something like the 68010's single instruction looping
> >   feature/cache?
> 
> 	moveq.l	#31,D0
> 	move.l 	ea,D1
> loop	asl.l	#1,D1
> 	dbcs 	loop
> 
>     Notes: You could just as easily specify 7, and use byte operations to 
>     find the first set bit in a byte.  The routine above checks bits MSB
>     first.  You can, of course, use lsr instead to check bits LSB first.
>     (remember to take 32 - the count afterword for longword operations)
>     You can add an outer loop to check any size item by using
>     (A0)+ for the EA on the MOVE (and setting up A0) when loading the 
>     initial data into D1, followed by, say, a BEQ if you also want to check for
>     an end of string.  Remember that the outer loop would only execute
>     once every 32 cached loops (97% of the instructions are cached) assuming
>     you use longword operations.
> 
.
. 68020 comments
. 
>     P.S. I didn't test this code... but it's almost a children's exercise
>     anyway.
> 
> 					-Matt

Yes, well, the problem with children is they often make mistakes.

1. The code you have produced does not meet the original "challenge"
   I made to the original poster who was wondering why specific
   registers and why a single instruction.  Your code does not all
   fit in the 68010's cache.  Lest you think this is picky I will point
   out that I chose the simplest of the string instructions as an example;
   others would take even more code and use more registers.

2. Duplicating the function has now used up a general purpose register.
    One of the advantages of the "all in one" type instruction is that
   internal temporaries can be used.

3.  You have not implemented FFS, you have implemented a much simpler version.
    FFS is (operand order may be wrong):

    FFS <base addr>,<offset>,<length>,<destination of count>

    <base addr> is a byte address of the bit string origin
    <offset> is a bit offset into the bit string at which the search
             begins; if the source is not a register this may be more than 32
             bits
    <length> is the length of the field to be checked in bits
    <destination of count> is where the bit position of the first set bit
                           is placed

*** REPLACE THIS LINE WITH YOUR MESSAGE ***
-- 
{watmath,seismo,uw-beaver}!ubc-vision!fornax!sfulccr!chapman
                   or  ...!ubc-vision!sfucmpt!chapman

allen@ccicpg.UUCP (05/12/87)

>>1. how would you code a string instruction like FFS (find first set bit)
>>   to fit in something like the 68010's single instruction looping
>>   feature/cache?
> 	moveq.l	#31,D0
> 	move.l 	ea,D1
> loop	asl.l	#1,D1
> 	dbcs 	loop
> 



A minor point.  According to "M68000 16/32-Bit Microprocessor", Programmer's
Reference Manual, Fourth Edition, page 218, Table G-1:
	asl.l	#1,D1
is not listed as a loopable instruction.

allen

dillon@CORY.BERKELEY.EDU (Matt Dillon) (05/13/87)

:A minor point.  According to "M68000 16/32-Bit Microprocessor", Programmer's
:Reference Manual, Fourth Edition, page 218, Table G-1:
:	asl.l	#1,D1
:is not listed as a loopable instruction.
:
:allen
	
	According to table G-1, ASL certainly IS a loopable instruction.
Re-read your manual.... it's in the second column.  Furthermore, instruction
cycle times for ASL in loop mode are given on page 212, table F-13.

	No apology necessary.

				-Matt

terryl@tekcrl.TEK.COM (05/13/87)

In article <8705130713.AA24919@cory.Berkeley.EDU> dillon@CORY.BERKELEY.EDU (Matt Dillon) writes:
+:A minor point.  According to "M68000 16/32-Bit Microprocessor", Programmer's
+:Reference Manual, Fourth Edition, page 218, Table G-1:
+:	asl.l	#1,D1
+:is not listed as a loopable instruction.
+:
+:allen
+	
+	According to table G-1, ASL certainly IS a loopable instruction.
+Re-read your manual.... it's in the second column.  Furthermore, instruction
+cycle times for ASL in loop mode are given on page 212, table F-13.
+
+	No apology necessary.
+
+				-Matt

     Sorry, Matt, I suggest you re-read your manual, specifically, the "Ap-
plicable Addressing Modes" column. The ONLY loopable shift instructions are
the shift instructions that shift MEMORY one bit at a time (word-sized), and
address the operand as (Ay), (Ay)+, or -(Ay). So allen is correct;

	asl.l	#1,D1

is NOT a loopable instruction.

dillon@CORY.BERKELEY.EDU (Matt Dillon) (05/14/87)

:     Sorry, Matt, I suggest you re-read your manual, specifically, the "Ap-
:plicable Addressing Modes" column. The ONLY loopable shift instructions are
:the shift instructions that shift MEMORY one bit at a time (word-sized), and
:address the operand as (Ay), (Ay)+, or -(Ay). So allen is correct;
:
:	asl.l	#1,D1
:
:is NOT a loopable instruction.

	Whoops.... you're right, I'm wrong.  Oh well, the algorithm wasn't
all that hot anyway.

			-Matt

cramer@infoswx.UUCP (05/15/87)

> >>1. how would you code a string instruction like FFS (find first set bit)
> >>   to fit in something like the 68010's single instruction looping
> >>   feature/cache?
> > 	moveq.l	#31,D0
> > 	move.l 	ea,D1
> > loop	asl.l	#1,D1
> > 	dbcs 	loop
> > 
> A minor point.  According to "M68000 16/32-Bit Microprocessor", Programmer's
> Reference Manual, Fourth Edition, page 218, Table G-1:
	> asl.l	#1,D1
> is not listed as a loopable instruction.
> 
> allen
>

For whatever it's worth, Table 4-11 of "68000 Microprocessor Handbook, 2nd Ed" 
(Osborne/McGraw-Hill) lists ASL as a loopable instruction.  It seems likely
to be one, in as much as it translates to a single word instruction (the basic
requirement for loopable instructions).

Bill Cramer
!ihnp4!infoswx!cramer

phils@tekigm2.UUCP (05/15/87)

In article <8705130713.AA24919@cory.Berkeley.EDU> dillon@CORY.BERKELEY.EDU (Matt Dillon) writes:
::A minor point.  According to "M68000 16/32-Bit Microprocessor", Programmer's
::Reference Manual, Fourth Edition, page 218, Table G-1:
::	asl.l	#1,D1
:           ^
::is not listed as a loopable instruction.
::
::allen
:	
:	According to table G-1, ASL certainly IS a loopable instruction.
:Re-read your manual.... it's in the second column.  Furthermore, instruction
:cycle times for ASL in loop mode are given on page 212, table F-13.
:
:	No apology necessary.
:
:				-Matt

	ASL is loopable only in the 'word' flavor. Allen indicated 'long'

				-Phil


-- 
-------------------------------------------------------------------------------
Phil Staub              tektronix!tekigm!phils    (206) 253-5634
Tektronix, Inc., ISI Engineering
P.O.Box 3500, M/S C1-904, Vancouver, Washington  98668

denny@dsndata.UUCP (Denny Page) (05/18/87)

Edition five of Motorola's reference manual does mention asl as a loopable
instruction.  However, the only applicable size and addressing modes listed
are as follows:

	asl.w	#1,(a1)

All shift/rotate instructions are listed with the same mode restrictions.

-- 

Denny	dsndata!denny		[Martha, the Clones are loose again!]